[R] recode according to specific sequence of characters within a string variable

Marc Schwartz marc_schwartz at me.com
Fri Feb 4 14:09:46 CET 2011


On Feb 4, 2011, at 6:32 AM, D. Alain wrote:

> Dear R-List, 
> 
> I have a dataframe with one column "name.of.report" containing character values, e.g.
> 
> 
>> df$name.of.report
> 
> "jeff_2001_teamx"
> "teamy_jeff_2002"
> "robert_2002_teamz"
> "mary_2002_teamz"
> "2003_mary_teamy"
> ...
> (i.e. the bit of interest is not always at same position)
> 
> Now I want to recode the column "name.of.report" into the variables "person", "year","team", like this
> 
>> new.df
> 
> "person"  "year"  "team"
> jeff           2001      x
> jeff           2002      y
> robert       2002      z
> mary        2002      z
> 
> I tried with grep()
> 
> df$person<-grep("jeff",df$name.of.report)
> 
> but of course it didn't exactly result in what I wanted to do. Could not find any solution via RSeek. Excuse me if it is a very silly question, but can anyone help me find a way out of this?
> 
> Thanks a lot
> 
> Alain


There will be several approaches, all largely involving the use of ?regex. Here is one:


DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", 
                                    "robert_2002_teamz", "mary_2002_teamz", 
                                    "2003_mary_teamy"))

> DF
     name.of.report
1   jeff_2001_teamx
2   teamy_jeff_2002
3 robert_2002_teamz
4   mary_2002_teamz
5   2003_mary_teamy


DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report),
                     year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report),
                     team = gsub(".*team(.).*","\\1", DF$name.of.report))


> DF.new
  person year team
1   jeff 2001    x
2   jeff 2002    y
3 robert 2002    z
4   mary 2002    z
5   mary 2003    y



HTH,

Marc Schwartz



More information about the R-help mailing list