[R] regex

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue Sep 17 17:04:32 CEST 2019


?regexp   ## Search the text on "backreference" .(or websearch it: "regular
expression backreference")

-- Bert


On Tue, Sep 17, 2019 at 7:52 AM Ivan Calandra <calandra using rgzm.de> wrote:

> Thank you Bert.
> That's more like what I was looking for.
>
> Could you please tell me where I can find information on the "\\1"? This
> is the part I still don't get.
>
> Ivan
>
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243https://www.researchgate.net/profile/Ivan_Calandra
>
> On 17/09/2019 16:42, Bert Gunter wrote:
>
> (For the units)
>
> Why not simply:
>
> sub(".*\\[(.+)\\]","\\1", headers)
>
> Cheers,
> Bert
>
>
> On Tue, Sep 17, 2019 at 6:40 AM Ivan Calandra <calandra using rgzm.de> wrote:
>
>> Thank you Ivan for your help!
>>
>> Your solution for the first problem is so simple I didn't even think
>> about it!
>> What I find weird is that "_w_|\\.csv$" works as expected ("OR"), but is
>> there no way to combine two patterns with an "AND"?
>>
>> Your solution to the second problem is actually unfortunately even more
>> complicated to me than the gsub() solution. But I'm glad I can learn
>> about regmatches() and regexpr()!
>>
>> Best,
>> Ivan
>>
>> --
>> Dr. Ivan Calandra
>> TraCEr, laboratory for Traceology and Controlled Experiments
>> MONREPOS Archaeological Research Centre and
>> Museum for Human Behavioural Evolution
>> Schloss Monrepos
>> 56567 Neuwied, Germany
>> +49 (0) 2631 9772-243
>> https://www.researchgate.net/profile/Ivan_Calandra
>>
>> On 17/09/2019 09:14, Ivan Krylov wrote:
>> > On Tue, 17 Sep 2019 08:48:43 +0200
>> > Ivan Calandra <calandra using rgzm.de> wrote:
>> >
>> >> CSVs <- list.files(path=..., pattern="\\.csv$")
>> >> w.files <- CSVs[grep(pattern="_w_", CSVs)]
>> >>
>> >> Of course, what I would like to do is list only the interesting files
>> >> from the beginning, rather than subsetting the whole list of files.
>> > One way to express that would be "_w_.*\\.csv$", meaning that the
>> > filename has to have "_w_" in it, followed by anything (any character
>> > repeated any number of times, including 0), followed by ".csv" at the
>> > end of the line.
>> >
>> >> 2) The units of the variables are given in the original headers. I
>> >> would like to extract the units. This is what I did: headers <-
>> >> c("dist to origin on curve [mm]","segment on section [mm]", "angle 1
>> >> [degree]", "angle 2 [degree]","angle 3 [degree]") units.var <-
>> >> gsub(pattern="^.*\\[|\\]$", "", headers)
>> >>
>> >> It seems to be to overly complicated using gsub(). Isn't there a way
>> >> to extract what is interesting rather than deleting what is not?
>> > Pure-R way: use regmatches() + regexpr(). Both regmatches and regexpr
>> > take the character vector as an argument, so duplication is hard to
>> > avoid:
>> >
>> > units <- regmatches(headers, regexpr('\\[.*\\]', headers))
>> >
>> > The stringr package has an str_match() function with a nicer interface:
>> > str_match(headers, '\\[.*\\]') -> units.
>> >
>> > Such "greedy" patterns containing ".*" present a few pitfalls, e.g.
>> > looking for text in parentheses using the pattern "\\(.*\\)" in
>> > "...(abc)...(def)..." will match the whole "(abc)...(def)" instead of
>> > single groups "(abc)" and "(def)", but with your examples the pattern
>> > should work as presented. One other option would be to ask for "[",
>> > followed by zero or more characters that are not "]", followed by "]":
>> > '\\[[^]]*\\]'.
>> >
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list