[R] regex
Ivan Calandra
c@|@ndr@ @end|ng |rom rgzm@de
Tue Sep 17 16:52:31 CEST 2019
Thank you Bert.
That's more like what I was looking for.
Could you please tell me where I can find information on the "\\1"? This
is the part I still don't get.
Ivan
--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra
On 17/09/2019 16:42, Bert Gunter wrote:
> (For the units)
>
> Why not simply:
>
> sub(".*\\[(.+)\\]","\\1", headers)
>
> Cheers,
> Bert
>
>
> On Tue, Sep 17, 2019 at 6:40 AM Ivan Calandra <calandra using rgzm.de
> <mailto:calandra using rgzm.de>> wrote:
>
> Thank you Ivan for your help!
>
> Your solution for the first problem is so simple I didn't even think
> about it!
> What I find weird is that "_w_|\\.csv$" works as expected ("OR"),
> but is
> there no way to combine two patterns with an "AND"?
>
> Your solution to the second problem is actually unfortunately even
> more
> complicated to me than the gsub() solution. But I'm glad I can learn
> about regmatches() and regexpr()!
>
> Best,
> Ivan
>
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://www.researchgate.net/profile/Ivan_Calandra
>
> On 17/09/2019 09:14, Ivan Krylov wrote:
> > On Tue, 17 Sep 2019 08:48:43 +0200
> > Ivan Calandra <calandra using rgzm.de <mailto:calandra using rgzm.de>> wrote:
> >
> >> CSVs <- list.files(path=..., pattern="\\.csv$")
> >> w.files <- CSVs[grep(pattern="_w_", CSVs)]
> >>
> >> Of course, what I would like to do is list only the interesting
> files
> >> from the beginning, rather than subsetting the whole list of files.
> > One way to express that would be "_w_.*\\.csv$", meaning that the
> > filename has to have "_w_" in it, followed by anything (any
> character
> > repeated any number of times, including 0), followed by ".csv"
> at the
> > end of the line.
> >
> >> 2) The units of the variables are given in the original headers. I
> >> would like to extract the units. This is what I did: headers <-
> >> c("dist to origin on curve [mm]","segment on section [mm]",
> "angle 1
> >> [degree]", "angle 2 [degree]","angle 3 [degree]") units.var <-
> >> gsub(pattern="^.*\\[|\\]$", "", headers)
> >>
> >> It seems to be to overly complicated using gsub(). Isn't there
> a way
> >> to extract what is interesting rather than deleting what is not?
> > Pure-R way: use regmatches() + regexpr(). Both regmatches and
> regexpr
> > take the character vector as an argument, so duplication is hard to
> > avoid:
> >
> > units <- regmatches(headers, regexpr('\\[.*\\]', headers))
> >
> > The stringr package has an str_match() function with a nicer
> interface:
> > str_match(headers, '\\[.*\\]') -> units.
> >
> > Such "greedy" patterns containing ".*" present a few pitfalls, e.g.
> > looking for text in parentheses using the pattern "\\(.*\\)" in
> > "...(abc)...(def)..." will match the whole "(abc)...(def)"
> instead of
> > single groups "(abc)" and "(def)", but with your examples the
> pattern
> > should work as presented. One other option would be to ask for "[",
> > followed by zero or more characters that are not "]", followed
> by "]":
> > '\\[[^]]*\\]'.
> >
>
> ______________________________________________
> R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list