[R] Data transformation & cleaning

Wed Sep 28 05:13:42 CEST 2011

Hi,

I have a few methodological and implementation questions for ya'll. Thank
you in advance for your help. I have a dataset that reflects people's
preference choices. I want to see if there's any kind of clustering effect
among certain preference choices (e.g. do people who pick choice A also pick
choice D). 

I have a data set that has one record per user ID, per preference choice.
It's a "long" form of a data set that looks like this: 

ID | Page
123 | Choice A
123 | Choice B
456 | Choice A
456 | Choice B
...

I thought that I should do the following

1. Make the data set "wide", counting the observations so the data looks
like this:
ID | Count of Preference A | Count of Preference B
123 | 1 | 1
...

Using 
table1 <- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )

2. Create a correlation matrix of preferences
cor(table2[,-1])

How would I restrict my correlation to show preferences that met a minimum
sample threshold? Can you confirm if the two following commands do the same
thing? What would I do from here (or am I taking the wrong approach)
table1 <- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
table2 <- with(data, table(Page,Page))

many thanks,
Peter

--
View this message in context: http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3849889.html
Sent from the R help mailing list archive at Nabble.com.