[R] Regex: workaround for variable length negative lookbehind
Stefan Evert
stefan.evert at uos.de
Sun Nov 30 20:59:54 CET 2008
>>
Hi Stefan! :-)
>> From tools where negative lookbehind can involve variable lengths,
>> one
> would think this would work:
>
> grep("(?<!(?:\\1|^))(.)\\1{1,}$", vec, perl=T)
>
> But then R doesn't like it that much ...
It's really the PCRE library that doesn't like your regexp, not R.
The problem is that negative behind is only possible with a fixed-
length expression, and since \1 may hold an arbitrary string, the PCRE
library can't be sure it's just a single character. I'm also
surprised that you're allowed to use \1 before defining it.
>
> But is there a one-line grep thingy to do this?
Can't think of a one-liner, but a three-line solution you can easily
enough wrap in a small function:
vec<-c("aaaa", "baaa", "bbaa", "bbba", "baamm", "aa")
idx.1 <- grep("(.)\\1$", vec)
idx.2 <- grep("^(.)\\1*$", vec)
vec[setdiff(idx.1, idx.2)]
Cheers,
Stefan
--
The wonders of Googleology (episode 1)
"from collectibles to cars"
84,700,000 -- Google
9,443,672 -- Google N-grams (Web 1T5)
1 -- ukWaC
[ stefan.evert at uos.de | http://purl.org/stefan.evert ]
More information about the R-help
mailing list