[R] Extract Element of String with R's Regex

Gabor Grothendieck ggrothendieck at gmail.com
Fri Aug 1 14:33:48 CEST 2008


On Fri, Aug 1, 2008 at 7:31 AM, Stephen Tucker <brown_emu at yahoo.com> wrote:
> In the example below, a straight application of strsplit() is probably the simplest solution. In a more general case where it may be desirable to match patterns, a combination of sub() or gsub() with strsplit() might do the trick:
>
>> x <- "Best-K Gene 11340 211952_at RANBP5 Noc= 3 - 2 LL= -963.669 -965.35"
>> patt <- "Best-K Gene \\d+ (\\w+) (\\w+) Noc= \\d - (\\d) LL= (.*)"
>
>> unlist(strsplit(gsub(patt,"\\1,\\2,\\3",x,perl=TRUE),","))
> [1] "211952_at" "RANBP5"    "2"
>
> Alternatively, you may want to take a look at the gsubfn package - it is quite useful. Still learning to use it myself...
>
>> library(gsubfn)
>> unlist(strapply(x,patt,function(x1,x2,x3) c(x1,x2,x3),backref=-3,perl=TRUE))
> [1] "211952_at" "RANBP5"    "2"
>

This last one can be slightly simplified:

> strapply(x, re, c, backref = -3, perl = TRUE)[[1]]
[1] "211952_at" "RANBP5"    "2"



More information about the R-help mailing list