[R] Basic question: Reading in multiple choice question responses to a single column in data frame

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Aug 19 21:17:15 CEST 2009


You might look at the mChoice function in the Hmisc package for some 
indirect help.

Frank

Damion Dooley wrote:
> I'm using read.delim to successfully read in tab delimited data, but some
> columns' values are comma seperated, reflecting the fact that user chose a
> few answers on a multi-select question.  I understand that each answer is
> its own category and so could be represented as a seperate column in the
> data set, but I'd like the option of reading in the data column, and
> converting it to a vector that has all row values (comma seperated or not)
> each have their own vector entry, so that the "table(columnData)" function
> does counts correctly.
>  
> So some code:
>  
>     myData = read.delim(myDataFile, row.names=1,header=TRUE,skip=10); #works
> fine
>     myColumn = myData[[question]]; #works fine, selects correct question
> column data
>  
> myColumn data is now e.g.:
>  
>     1
>     0
>     2
>     0,2
>     0
>     3
>     2
>     2,1
>  
> with the comma seperated values looking like atomic string values I guess.
> But I would like:
>  
>     1
>     0
>     2
>     0
>     2
>     0
>     3
>     2
>     2
>     1
>  
> I've tried various things, e.g. grep to recognize and expand the comma
> seperated values, but since vector functions are at work, I can only replace
> 1 value back into the myColumn data, e.g. "0,2" entry becomes "0" or "2" if
> I use 
>  
>     myColumn=gsub("^([0-9]+),([0-9]+),$",c('\\1'),myColumn,perl=TRUE) #or
> replace with c('\\2')
>  
> but I can't replace into c('\\1','\\2') 
>  
> Any elegant or otherwise ways to do this?
>  
> Much appreciated,
>  
> Damion
>  
>  
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list