[R] recode according to specific sequence of characters within a string variable
Marc Schwartz
marc_schwartz at me.com
Fri Feb 4 14:09:46 CET 2011
On Feb 4, 2011, at 6:32 AM, D. Alain wrote:
> Dear R-List,
>
> I have a dataframe with one column "name.of.report" containing character values, e.g.
>
>
>> df$name.of.report
>
> "jeff_2001_teamx"
> "teamy_jeff_2002"
> "robert_2002_teamz"
> "mary_2002_teamz"
> "2003_mary_teamy"
> ...
> (i.e. the bit of interest is not always at same position)
>
> Now I want to recode the column "name.of.report" into the variables "person", "year","team", like this
>
>> new.df
>
> "person" "year" "team"
> jeff 2001 x
> jeff 2002 y
> robert 2002 z
> mary 2002 z
>
> I tried with grep()
>
> df$person<-grep("jeff",df$name.of.report)
>
> but of course it didn't exactly result in what I wanted to do. Could not find any solution via RSeek. Excuse me if it is a very silly question, but can anyone help me find a way out of this?
>
> Thanks a lot
>
> Alain
There will be several approaches, all largely involving the use of ?regex. Here is one:
DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002",
"robert_2002_teamz", "mary_2002_teamz",
"2003_mary_teamy"))
> DF
name.of.report
1 jeff_2001_teamx
2 teamy_jeff_2002
3 robert_2002_teamz
4 mary_2002_teamz
5 2003_mary_teamy
DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report),
year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report),
team = gsub(".*team(.).*","\\1", DF$name.of.report))
> DF.new
person year team
1 jeff 2001 x
2 jeff 2002 y
3 robert 2002 z
4 mary 2002 z
5 mary 2003 y
HTH,
Marc Schwartz
More information about the R-help
mailing list