[R] overlapping pattern match (errata 2.0)

james.holtman@convergys.com james.holtman at convergys.com
Sat Mar 29 18:02:54 CET 2003

Another way to find all the multiple occurances of a character in a string
is to use 'rle':

> x.s <- 'aaabbcdeeeffffggiijjysbbddeffghjjjsdkkkkk'
> x <- unlist(strsplit(x.s, NULL))
> x
 [1] "a" "a" "a" "b" "b" "c" "d" "e" "e" "e" "f" "f" "f" "f" "g" "g" "i"
"i" "j"
[20] "j" "y" "s" "b" "b" "d" "d" "e" "f" "f" "g" "h" "j" "j" "j" "s" "d"
"k" "k"
[39] "k" "k" "k"
> rle(x)
Run Length Encoding
  lengths: int [1:21] 3 2 1 1 3 4 2 2 2 1 ...
  values : chr [1:21] "a" "b" "c" "d" "e" "f" "g" "i" "j" "y" "s" "b" "d"
"e" "f" "g" ...

When the lengths are >1, the corresponding 'values' are the repeated

well! excuse me again but...

your.string <- "aaacdf"
nc1 <- nchar(your.string)-1
x <- unlist(strsplit(your.string, NULL)) ######## CORRECT
x2 <- c()
for (i in 1:nc1)
x2 <- c(x2, paste(x[i], x[i+1], sep="")) ######## ERRATA 2
cat("ocurrences of <aa> in <your.string>: ", length(grep("aa", x2)),
sep="", fill=TRUE)


PD: sorry again

