Uwe Ligges
ligges at statistik.tu-dortmund.de
Sun Jan 2 15:51:03 CET 2011
On 02.01.2011 02:28, Thorsten Biegner wrote:
> Hi
>
> The short version of my questions is this:
>
> How can I run a chi-square test over a matrix (table) to get the distanaces
> between rows and then run a SingleLinkage (or other fusion algorithm over
> the resulting table?
>
> ------------
>
> The long-version of my question:
>
> My data consists of different data of different countries so I have stuff
> like how many people can read, write in X,Y,Z countries and then percentages
> for each country. And I want to find out which countries might be similar by
> doing a cluster analysis.
>
> So first I want to take the data which would look something like this:
>
> Plastikbecher Kartonbox Papier
> Rama 24 65 12
> Homa 83 30 21
> Flora 75 28 22
> SB 35 55 21
> Holl. Butter 20 40 75
>
> And then run a chi-square test over it (I think that makes the most sense or
> does anybody think something different)?
>
> So for that I will put each row with every other row in a single different
> matrix (mat1) and use the use the chisq.test.
>
> So mat 1 would for example looks like this:
>
> Plastikbecher Kartonbox Papier
> Rama 24 65 12
> Flora 75 28 22
>
> And then I would run matResult[1,3]<- sqrt(chisq.test(mat1)[[1]])
>
> So in the end I would get a matrix like this:
> Rama Homa Flora SB HollButter
> Rama 0.000 6.642 6.470 2.209 6.931
> Homa 6.642 0.000 0.430 4.994 8.387
> Flora 6.470 0.430 0.000 4.754 7.941
> SB 2.209 4.994 4.754 0.000 5.901
> HollButter 6.931 8.387 7.941 5.901 0.000
>
> So here is my question:
> How can I run a single linkage algorithm over this matrix?
>
> I thought a good stating point might be "hclust"
>
> hclust(d, method = "complete", members=NULL)
>
> But the R reference says d must be "a dissimilarity structure as produced by
> dist."
>
> But the dist function does not have a method chisquared-test or something
> similar.
Well, there is as.dist, so just use:
hclust(as.dist(matResult), .......)
Uwe Ligges
> So does anybody have an idea how I can do a clusteranalysis with a
> chi-squared test and then use a fusion algorithm to join the clusters?
>
> Thanks
>
> Thorsten
>
>
