[R] regular expression

David Winsemius dwinsemius at comcast.net
Thu Mar 1 00:35:31 CET 2012


On Feb 29, 2012, at 2:24 PM, Fred G wrote:

> Computer Friends,
>
> with the following example lines:

Modified to be correct R code. Please emulate my example in the future.

inp <-c( "98-610: Cell type: S; Surv(months): 6; STATUS(0=alive,  
1=dead): 1",
"99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1")

>
> i want to be able to isolate the number of months of survival for  
> each row.
>
> is there a regular expression that can find the first instance of a  
> ";",
> delete everything in front of it-- and find the second instance of  
> an ";"
> and delete everything behind it? in python there is a function  
> line.find(),
> would be grateful to hear the R equiv; or, any other better  
> alternatives to
> get the number of months of survival stored as a variable.

You can use either regex methods (noting that the "?" is necessary to  
defeat the default greedy nature of regex match.


 > sub( ";.+$", "", sub("^.+?;", "", inp) )
[1] " Surv(months): 6"  " Surv(months): 21"

...  or you can read these as lines and pass the results to read.table  
with sep =";".

 > read.table(text=inp, sep=";", stringsAsFactors=FALSE)[ ,2]
[1] " Surv(months): 6"  " Surv(months): 21"

>
> 	[[alternative HTML version deleted]]

Please learn to post in palin text.

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list