[R] Regex: workaround for variable length negative lookbehind
Stefan Th. Gries
stgries at gmail.com
Sun Nov 30 20:33:21 CET 2008
Hi all
I have the following regular expression problem: I want to find
complete elements of a vector that end in a repeated character but
where the repetition doesn't make up the whole word. That is, for the
vector vec:
vec<-c("aaaa", "baaa", "bbaa", "bbba", "baamm", "aa")
I would like to get
"baaa"
"bbaa"
"baamm"
>From tools where negative lookbehind can involve variable lengths, one
would think this would work:
grep("(?<!(?:\\1|^))(.)\\1{1,}$", vec, perl=T)
But then R doesn't like it that much ... I also know I can get it like this:
whole.word.rep <- grep("^(.)\\1{1,}$", vec, perl=T) # 1 6
rep.at.end <- grep("(.)\\1{1,}$", vec, perl=T) # 1 2 3 5 6
setdiff(rep.at.end, whole.word.rep) # 2 3 5
But is there a one-line grep thingy to do this?
Thx for any pointers,
STG
More information about the R-help
mailing list