[R] datastructure for multi-choice factors

Thu Jul 7 10:18:25 CEST 2011

I am working on a system to visualize survey responses. Survey responses
typically include factors, numeric, timestamps, textfields and therefore fit
perfectly nice in dataframes, making it easy to visualize using standard R
functions.

However I am currently working on a survey that also include questions in
which the respondent can check more than one answer on a single multichoice
item. I.e. this represents a factor for which every row has multiple
responses. I am looking for a way to put this into a dataframe together with
the other questions of the survey.

I considered three workarounds, but both are problematic:

 - Column-wise expanding: convert a single multi-choice item into N binary
column factors for every possible response (level) with 1/0 values
representing if the answer was checked or not. Problem with this is that you
lose the information that these N columns are in fact one question and it
becomes very hard to vizualise this single question. 

- Row wise expanding: convert a single response into N rows, one for every
response. Problem with this is that if the factor is part of the dataframe,
also all of the other items have to be duplicated, leading to artificial
results.

I was wondering if there is a more natural datastructure to put a
multi-choice item into a dataframe? Some code for illustration:

people <- list(
  name=c("John", "Mary", "Jennifer", "Neil"),
  gender=factor(c("M","F","F","M")),
  age=c(34,23,40,30),
  residence=sapply(list("US", c("US", "CA"), "MX", c("MX", "US", "CA")),
factor, levels=c("US", "CA", "MX"))
);

--
View this message in context: http://r.789695.n4.nabble.com/datastructure-for-multi-choice-factors-tp3650940p3650940.html
Sent from the R help mailing list archive at Nabble.com.