[R] regular expression

Gabor Grothendieck ggrothendieck at gmail.com
Thu Mar 1 00:34:00 CET 2012


On Wed, Feb 29, 2012 at 2:24 PM, Fred G <bayespokerguy at gmail.com> wrote:
> Computer Friends,
>
> with the following example lines:
>
> [107] "98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1"
>
> [108] "99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1"
>
> i want to be able to isolate the number of months of survival for each row.
>
> is there a regular expression that can find the first instance of a ";",
> delete everything in front of it-- and find the second instance of an ";"
> and delete everything behind it? in python there is a function line.find(),
> would be grateful to hear the R equiv; or, any other better alternatives to
> get the number of months of survival stored as a variable.
>

This extracts all the numeric fields:

# sample data
Lines <- c("98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1",
"99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1")

library(gsubfn)
strapply(Lines, "(\\d+);", as.numeric, simplify = TRUE)


# We can also get all numeric fields in case that is of interest:

strapply(Lines, "\\d+", as.numeric, simplify = rbind)


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list