[R] Data transformation & cleaning
Jim Lemon
jim at bitwrit.com.au
Wed Sep 28 12:29:34 CEST 2011
On 09/28/2011 01:13 PM, pip56789 wrote:
> Hi,
>
> I have a few methodological and implementation questions for ya'll. Thank
> you in advance for your help. I have a dataset that reflects people's
> preference choices. I want to see if there's any kind of clustering effect
> among certain preference choices (e.g. do people who pick choice A also pick
> choice D).
>
> I have a data set that has one record per user ID, per preference choice.
> It's a "long" form of a data set that looks like this:
>
> ID | Page
> 123 | Choice A
> 123 | Choice B
> 456 | Choice A
> 456 | Choice B
> ...
>
> I thought that I should do the following
>
> 1. Make the data set "wide", counting the observations so the data looks
> like this:
> ID | Count of Preference A | Count of Preference B
> 123 | 1 | 1
> ...
>
> Using
> table1<- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )
>
> 2. Create a correlation matrix of preferences
> cor(table2[,-1])
>
> How would I restrict my correlation to show preferences that met a minimum
> sample threshold? Can you confirm if the two following commands do the same
> thing? What would I do from here (or am I taking the wrong approach)
> table1<- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
> table2<- with(data, table(Page,Page))
>
>
Hi Peter,
An easy way to visualize set intersections is the intersectDiagram
function in the plotrix package. This will display the counts or
percentages of each type of intersection. Your data could be passed like
this:
choices<-data.frame(IDs=sample(1:20,50,TRUE),
sample(LETTERS[1:4],50,TRUE))
library(plotrix)
intersectDiagram(choices)
This example is a bit messy, as it will generate quite a few repeated
choices that will be ignored by intersectDiagram, but it should give you
the idea.
Jim
More information about the R-help
mailing list