[R] question about string handling....

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jul 14 22:08:05 CEST 2010


On Wed, Jul 14, 2010 at 2:21 PM, karena <dr.jzhou at gmail.com> wrote:
>
> Hi,
>
> I have a data.frame as following:
> var1         var2
> 1           ab_c_(ok)
> 2           okf789(db)_c
> 3           jojfiod(90).gt
> 4           "ij"_(78)__op
> 5           (iojfodjfo)_ab
>
> what I want is to create a new variable called "var3". the value of var3 is
> the content in the Parentheses. so var3 would be:
> var3
> ok
> db
> 90
> 78
> iojfodjfo
>

Here are several alternatives.  The gsub solution matches everything
up to the ( as well as everything after the ) and replaces each with
nothing.  The strsplit solution splits each into three fields,
everything before the (, everything with in the (), and everything
after the ) and the picks off the second.  The strapply solution
matches everything from ( to ) and returns everything between them.
The below works whether DF$var2 is factor or character but if you know
its character you can drop the as.character in #2 and #3.

# 1
gsub(".*[(]|[)].*", "", DF$var2)

# 2
sapply(strsplit(as.character(DF$var2), "[()]"), "[", 2)

# 3
library(gsubfn)
strapply(as.character(DF$var2), "[(](.*)[)]", simplify = TRUE)



More information about the R-help mailing list