[R] turning comma separated string from multiple choices into flags
June Kim
juneaftn at gmail.com
Mon Sep 29 17:12:34 CEST 2008
Thank you. The misspelling of Harvard wasn't intended. The data are
spelled consistently.
2008/9/30 Peter Dalgaard <P.Dalgaard at biostat.ku.dk>:
> June Kim wrote:
>> Hello,
>>
>> I use google docs' Forms to conduct surveys online. Multiple choices
>> questions are coded as comma separated values.
>>
>> For example,
>>
>> if the question is like:
>>
>> 1. What magazines do you currently subscribe to? (you can choose
>> multiple choices)
>> 1) Fast Company
>> 2) Havard Business Review
>> 3) Business Week
>> 4) The Economist
>>
>> And if the subject chose 1) and 3), the data is coded as a cell in a
>> spreadsheet as,
>>
>> "Fast Company, Business Week"
>>
>> I read the data with read.csv into R. To analyze the data, I have to
>> change that string into something like flags(indicator variables?).
>> That is, there should be 4 variables, of which values are either 1 or
>> 0, indicating chosen or not-chosen respectively.
>>
>> Suppose the data is something like,
>>
>>
>>> survey1
>>>
>> age favorite_magazine
>> 1 29 Fast Company
>> 2 31 Fast Company, Business Week
>> 3 32 Havard Business Review, Business Week, The Economist
>>
>>
>> Then I have to chop the string in favorite_magazine column to turn
>> that data into something like,
>>
>>
>>> survey1transformed
>>>
>> age Fast Company Havard Business Review Business Week The Economist
>> 1 29 1 0 0 0
>> 2 31 1 0 1 0
>> 3 32 0 1 1 1
>>
>>
>> Actually I have many more multiple choice questions in the survey.
>>
>> What is the easy elegant and natural way in R to do the job?
>>
>
> I'd look into something like as.data.frame(lapply(strings, grep,
> x=favorite_magazine, fixed=TRUE)), where strings <- c("Fast Company",
> "Havard Business Review", ...).
>
> (I take it that the mechanism is such that you can rely on at least
> having everything misspelled in the same way? If it is alternatingly
> "Havard" and "Harvard", then things get a bit trickier.)
>
> --
> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
>
>
>
More information about the R-help
mailing list