[R] creating a derived variable in a data frame

Greg Snow greg.snow at ihc.com
Thu Oct 20 17:37:12 CEST 2005


>>>> "Martin Henry H. Stevens" <HStevens at MUOhio.edu> 10/20/05 08:47AM
>>>
>Hi Avram-
>How many countries do you have?
>I would do it the following way because it is simple and I don't know 

>any better, even if it is  absurdly painstaking.
>
>#Step 1
>mydata$continent <- factor(NA, levels=c("NoAm","Euro"))
>
>#Steps 2 a-z
>mydata$continent[mydata$country=="US" |
>                                 mydata$country=="CA" |
>                                mydata$country=="MX" ]  <- "NoAm"

A shorter alternative to the above is to use %in% like:

mydata$continent[ mydata$country %in% c("US","CA","MX") ] <- "NoAm"

You could also create a new data frame with 2 columns for the country
and 
corresponding continent, then merge this with your data (see ?merge).

>
>#Repeat for all countries and continents.
>
>Hank
>
>
>On Oct 19, 2005, at 8:09 PM, Avram Aelony wrote:
>
>> Hello,
>>
>> I have read through the manuals and can't seem to find an answer.
>>
>> I have a categorical, character variable that has hundreds of  
>> values.  I want to group the existing values of this variable into 

>> a new, derived (categorical) variable by applying conditions to the 

>> values in the data.
>>
>> For example, suppose I have a data frame with variables: date,  
>> country, x, y, and z.
>>
>> x,y,z are numeric and country is a 2-digit character string.  I  
>> want to create a new derived variable named "continent" that would 

>> also exist in the data frame. The Continent variable would have  
>> values of "Asia", "Europe", "North America", etc...
>>
>> How would this best be done for a large dataset (>10MB) ?
>> I have tried many variations on following without success (note in 

>> a real example I would have a longer list of countries and  
>> continent values):
>>
>>
>>> mydata$continent <- mydata[ mydata$country==list 
>>> ('US','CA','MX'), ] -> "North America"
>>>
>>
>> I have read about factors, but I am not sure how they apply here.
>>
>> Can anyone help me with the syntax?  I am sure it is trivial and a 

>> common thing to do.
>> The ultimate goal is to compute percentages of x by continent.
>>
>> Thanks for any help in advance.
>>
>> -Avram
>


Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow at ihc.com
(801) 408-8111




More information about the R-help mailing list