[R] R regular expression to extract words with the query string.

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jul 9 05:14:44 CEST 2009


The solution below does not include the pid: string before it.
This modification works:

> strapply(i, paste("[^ ]*", "ENSP", "[^ ]*", sep = ""), c, simplify = unlist)
[1] "pid:ENSP000012345"

On Wed, Jul 8, 2009 at 10:08 AM, Gabor
Grothendieck<ggrothendieck at gmail.com> wrote:
> Try this:
>
> library(gsubfn)
> i <- "transcript:ENST0000112334 pid:ENSP000012345"
> strapply(i, paste("\\w*", "ENSP", "\\w*", sep = ""), c, simplify = unlist)
>
> This says to match any number (possibly zero) of word
> characters followed by ENSP followed by more word
> characters.  c just returns the match without
> further processing and unlist unlists the result giving
> a character vector (which otherwise would be a list).
>
> See http://gsubfn.googlecode.com for more info.
>
> On Wed, Jul 8, 2009 at 9:04 AM, Praveen
> Surendran<praveen.surendran at ucd.ie> wrote:
>> Hi,
>>
>>
>>
>> Is there a way in R to get the string which matches the expression, where
>> the expression is a substring of the parent string.
>>
>>
>>
>> Lets say, I have $i <- "transcript:ENST0000112334 pid:ENSP000012345"
>>
>> What I need is the string "pid:ENSP000012345" from $i using the query
>> "ENSP".
>>
>>
>>
>> Appreciate your comments.
>>
>>
>>
>> Praveen  Surendran
>>
>> School of Medicine and Medical Sciences
>>
>> University College Dublin
>>
>> Belfiled, Dublin 4
>>
>> Ireland.
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>




More information about the R-help mailing list