jeroen00ms jeroen.ooms at stat.ucla.edu
Thu Jul 7 10:18:25 CEST 2011

I am working on a system to visualize survey responses. Survey responses
typically include factors, numeric, timestamps, textfields and therefore fit
perfectly nice in dataframes, making it easy to visualize using standard R

However I am currently working on a survey that also include questions in
which the respondent can check more than one answer on a single multichoice
item. I.e. this represents a factor for which every row has multiple
responses. I am looking for a way to put this into a dataframe together with
the other questions of the survey.

I considered three workarounds, but both are problematic:

 - Column-wise expanding: convert a single multi-choice item into N binary
column factors for every possible response (level) with 1/0 values
representing if the answer was checked or not. Problem with this is that you
lose the information that these N columns are in fact one question and it
becomes very hard to vizualise this single question. 

- Row wise expanding: convert a single response into N rows, one for every
response. Problem with this is that if the factor is part of the dataframe,
also all of the other items have to be duplicated, leading to artificial

I was wondering if there is a more natural datastructure to put a
multi-choice item into a dataframe? Some code for illustration:

people <- list(
  name=c("John", "Mary", "Jennifer", "Neil"),
  residence=sapply(list("US", c("US", "CA"), "MX", c("MX", "US", "CA")),
factor, levels=c("US", "CA", "MX"))

