[R] Regex: workaround for variable length negative lookbehind

Gabor Grothendieck ggrothendieck at gmail.com
Sun Nov 30 21:26:43 CET 2008


Try this:

> vec <- c("aaaa", "baaa", "bbaa", "bbba", "baamm", "aa")

> grep("^(?!(.)\\1{1,}$).*(.)\\2{1,}$", vec, perl = TRUE)
[1] 2 3 5

The (?...) succeeds only if the string is not all the same
character and since that consumes no characters it
restarts at the beginning to match anything followed
by repeated characters to the end.

On Sun, Nov 30, 2008 at 2:33 PM, Stefan Th. Gries <stgries at gmail.com> wrote:
> Hi all
>
> I have the following regular expression problem: I want to find
> complete elements of a vector that end in a repeated character but
> where the repetition doesn't make up the whole word. That is, for the
> vector vec:
>
> vec<-c("aaaa", "baaa", "bbaa", "bbba", "baamm", "aa")
>
> I would like to get
> "baaa"
> "bbaa"
> "baamm"
>
> >From tools where negative lookbehind can involve variable lengths, one
> would think this would work:
>
> grep("(?<!(?:\\1|^))(.)\\1{1,}$", vec, perl=T)
>
> But then R doesn't like it that much ... I also know I can get it like this:
>
> whole.word.rep <- grep("^(.)\\1{1,}$", vec, perl=T) # 1 6
> rep.at.end <- grep("(.)\\1{1,}$", vec, perl=T) # 1 2 3 5 6
> setdiff(rep.at.end, whole.word.rep) # 2 3 5
>
> But is there a one-line grep thingy to do this?
>
> Thx for any pointers,
> STG
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list