[R] & and |

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Fri Aug 21 00:37:43 CEST 2020


The single grep regex solutions offered to Ivan's problem were fine, but do
not readily generalize to the conjunction of multiple (>2, say) regex
patterns that can appear anywhere in a string and in any order. However,
note that this can easily be done using the Perl zero width lookahead
construction,  "(?=...)" .
e.g.
> test <- test <- c("xyCz",
"xAyCz","xAyBzC","xCByAz","xACyB","BAyyC","CBxBAy")

## to search for strings contain "A", "B", & "C" in any order
> grep("(?=.*A)(?=.*B)(?=.*C)", test, perl = TRUE)
[1] 3 4 5 6 7

Note that this matches on one or multiple instances of the patterns. If one
wants only exactly one instance of each conjunct,  then something like this
should do:

> lookfor <- c("A","B","C")
> notme <- paste0("[^",lookfor,"]*")
> z <- paste0("(?=", notme, lookfor, notme, "$)",collapse = "")
> grep(z, test, perl = TRUE)
[1] 3 4 5 6

Cheers,
Bert




On Wed, Aug 19, 2020 at 11:38 PM Ivan Calandra <calandra using rgzm.de> wrote:

> Thank you all for all the very helpful answers!
>
> Best,
> Ivan
>
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://www.researchgate.net/profile/Ivan_Calandra
>
> On 20/08/2020 3:28, Richard O'Keefe wrote:
> > There are & and | operators in the R language.
> > There is an | operator in regular expressions.
> > There is NOT any & operator in regular expressions.
> > grep("ConfoMap&GuineaPigs", mydata, value=TRUE)
> > looks for elements of mydata containing the literal
> > string 'ConfoMap&GuineaPigs'.
> >
> > > foo <- c("a","b","cab","back")
> > > foo[grepl("a",foo) & grepl("b",foo)]
> > [1] "cab"  "back"
> >
> > grepl returns a TRUE/FALSE vector.
> >
> > On Thu, 20 Aug 2020 at 02:53, Ivan Calandra <calandra using rgzm.de
> > <mailto:calandra using rgzm.de>> wrote:
> >
> >     Dear useRs,
> >
> >     I feel really stupid, but I cannot understand why "&" doesn't work
> >     as I
> >     expect, while "|" does.
> >
> >     I have the following vector:
> >     mydata <- c("SSFA-ConfoMap_GuineaPigs_NMPfilled.csv",
> >     "SSFA-ConfoMap_Lithics_NMPfilled.csv",
> >     "SSFA-ConfoMap_Sheeps_NMPfilled.csv",
> >     "SSFA-Toothfrax_GuineaPigs.xlsx",
> >     "SSFA-Toothfrax_Lithics.xlsx", "SSFA-Toothfrax_Sheeps.xlsx")
> >     and I want to find the values that include both "ConfoMap" and
> >     "GuineaPigs".
> >
> >     If I do:
> >     grep("ConfoMap&GuineaPigs", mydata, value=TRUE)
> >     it returns an empty vector, character(0).
> >
> >     But if I do:
> >     grep("ConfoMap|GuineaPigs", mydata, value=TRUE)
> >     it returns all the elements that include either "ConfoMap" or
> >     "GuineaPigs", as I would expect.
> >
> >     So what is wrong with my "&" construct? How can I return the elements
> >     that include both parts?
> >
> >     Thank you for your help!
> >     Ivan
> >
> >     --
> >     Dr. Ivan Calandra
> >     TraCEr, laboratory for Traceology and Controlled Experiments
> >     MONREPOS Archaeological Research Centre and
> >     Museum for Human Behavioural Evolution
> >     Schloss Monrepos
> >     56567 Neuwied, Germany
> >     +49 (0) 2631 9772-243
> >     https://www.researchgate.net/profile/Ivan_Calandra
> >
> >     ______________________________________________
> >     R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
> >     To UNSUBSCRIBE and more, see
> >     https://stat.ethz.ch/mailman/listinfo/r-help
> >     PLEASE do read the posting guide
> >     http://www.R-project.org/posting-guide.html
> >     and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list