[R] Add column to dataframe based on code in other column

Bert Gunter gunter.berton at gene.com
Thu Aug 8 17:06:36 CEST 2013


Dark:

1. In future, please use dput()  to post data to enable us to more
easily read them from your email.

2. As Berend demonstrates, using a more appropriate data structure is
what's required. Here is a slightly shorter, but perhaps trickier
alternative to his solution:

> df  ## Your example data frame
   Name State_Code
1   Tom         20
2 Harry         56
3   Ben          5
4 Sally          4

> l <-list(MidWest=MidWest,South=South,NorthEast=NorthEast,Other=Other,West=West)
> df <- within(df,regions <- rep(names(l),sapply(l,length))[match(State_Code,unlist(l))])
> df
   Name State_Code   regions
1   Tom         20 NorthEast
2 Harry         56     Other
3   Ben          5      West
4 Sally          4     South

3. Need I say that there may be other alternatives that might be better.

Cheers,
Bert


On Thu, Aug 8, 2013 at 7:14 AM, Berend Hasselman <bhh at xs4all.nl> wrote:
>
> On 08-08-2013, at 11:33, Dark <info at software-solutions.nl> wrote:
>
>> Hi all,
>>
>> I have a dataframe of users which contain US-state codes.
>> Now I want to add a column named REGION based on the state code. I have
>> already done a mapping:
>>
>> NorthEast <- c(07, 20, 22, 30, 31, 33, 39, 41, 47)
>> MidWest <- c(14, 15, 16, 17, 23, 24, 26, 28, 35, 36, 43, 52)
>> South <- c(01, 04, 08, 09, 10, 11, 18, 19, 21, 25, 34, 37, 42, 44, 45, 49,
>> 51)
>> West <- c(02, 03, 05, 06, 12, 13, 27, 29, 32, 38, 46, 50, 53)
>> Other <- c(40, 48, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 94,
>> 98, 99)
>>
>> So for example:
>> Name    State_Code
>> Tom       20
>> Harry     56
>> Ben         05
>> Sally       04
>>
>> Should become like:
>> So for example:
>> Name    State_Code REGION
>> Tom       20                   NorthEast
>> Harry     56                   Other
>> Ben         05                  West
>> Sally       04                   South
>>
>
> dd <- read.table(text="Name    State_Code
> Tom       20
> Harry     56
> Ben         05
> Sally       04", header=TRUE, stringsAsFactors=FALSE)
>
> # Create table for regions indexed by state_code
>
> region.table <- rep("UNKNOWN",99)
> region.table[NorthEast] <- "NorthEast"
> region.table[MidWest] <- "MidWest"
> region.table[South] <- "South"
> region.table[West] <- "West"
> region.table[Other] <- "Other"
> region.table
>
> # then this is easy
>
> dd[,"REGION"] <- region.table[dd$State_Code]
>
>
> Berend
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list