[R] turning comma separated string from multiple choices into flags

June Kim juneaftn at gmail.com
Mon Sep 29 17:12:34 CEST 2008


Thank you. The misspelling of Harvard wasn't intended. The data are
spelled consistently.

2008/9/30 Peter Dalgaard <P.Dalgaard at biostat.ku.dk>:
> June Kim wrote:
>> Hello,
>>
>> I use google docs' Forms to conduct surveys online. Multiple choices
>> questions are coded as comma separated values.
>>
>> For example,
>>
>> if the question is like:
>>
>> 1. What magazines do you currently subscribe to? (you can choose
>> multiple choices)
>> 1) Fast Company
>> 2) Havard Business Review
>> 3) Business Week
>> 4) The Economist
>>
>> And if the subject chose 1) and 3), the data is coded as a cell in a
>> spreadsheet as,
>>
>> "Fast Company, Business Week"
>>
>> I read the data with read.csv into R. To analyze the data, I have to
>> change that string into something like flags(indicator variables?).
>> That is, there should be 4 variables, of which values are either 1 or
>> 0, indicating chosen or not-chosen respectively.
>>
>> Suppose the data is something like,
>>
>>
>>> survey1
>>>
>>   age                                    favorite_magazine
>> 1  29                                         Fast Company
>> 2  31                          Fast Company, Business Week
>> 3  32 Havard Business Review, Business Week, The Economist
>>
>>
>> Then I have to chop the string in favorite_magazine column to turn
>> that data into something like,
>>
>>
>>> survey1transformed
>>>
>>   age Fast Company Havard Business Review Business Week The Economist
>> 1  29            1                      0             0             0
>> 2  31            1                      0             1             0
>> 3  32            0                      1             1             1
>>
>>
>> Actually I have many more multiple choice questions in the survey.
>>
>> What is the easy elegant and natural way in R to do the job?
>>
>
> I'd look into something like as.data.frame(lapply(strings, grep,
> x=favorite_magazine, fixed=TRUE)), where strings <- c("Fast Company",
> "Havard Business Review", ...).
>
> (I take it that the mechanism is such that you can rely on at least
> having everything misspelled in the same way? If it is alternatingly
> "Havard" and "Harvard", then things get a bit trickier.)
>
> --
>   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>
>
>


More information about the R-help mailing list