# [R] Clusteranalysis Chi-square test and SingleLinkage

Uwe Ligges ligges at statistik.tu-dortmund.de
Sun Jan 2 15:51:03 CET 2011

```
On 02.01.2011 02:28, Thorsten Biegner wrote:
> Hi
>
> The short version of my questions is this:
>
> How can I run a chi-square test over a matrix (table) to get the distanaces
> between rows and then run a SingleLinkage (or other fusion algorithm over
> the resulting table?
>
> ------------
>
> The long-version of my question:
>
> My data consists of different data of different countries so I have stuff
> like how many people can read, write in X,Y,Z countries and then percentages
> for each country. And I want to find out which countries might be similar by
> doing a cluster analysis.
>
> So first I want to take the data which would look something like this:
>
>               Plastikbecher Kartonbox Papier
> Rama                    24        65     12
> Homa                    83        30     21
> Flora                   75        28     22
> SB                      35        55     21
> Holl. Butter            20        40     75
>
> And then run a chi-square test over it (I think that makes the most sense or
> does anybody think something different)?
>
> So for that I will put each row with every other row in a single different
> matrix (mat1) and use the use the chisq.test.
>
> So mat 1 would for example looks like this:
>
>               Plastikbecher Kartonbox Papier
> Rama                    24        65     12
> Flora                   75        28     22
>
> And then I would run matResult[1,3]<- sqrt(chisq.test(mat1)[[1]])
>
> So in the end I would get a matrix like this:
>              Rama  Homa Flora    SB HollButter
> Rama       0.000 6.642 6.470 2.209      6.931
> Homa       6.642 0.000 0.430 4.994      8.387
> Flora      6.470 0.430 0.000 4.754      7.941
> SB         2.209 4.994 4.754 0.000      5.901
> HollButter 6.931 8.387 7.941 5.901      0.000
>
> So here is my question:
> How can I run a single linkage algorithm over this matrix?
>
> I thought a good  stating point might be "hclust"
>
> hclust(d, method = "complete", members=NULL)
>
> But the R reference says d must be "a dissimilarity structure as produced by
> dist."
>
> But the dist function does not have a method chisquared-test or something
> similar.

Well, there is as.dist, so just use:

hclust(as.dist(matResult), .......)

Uwe Ligges

> So does anybody have an idea how I can do a clusteranalysis with a
> chi-squared test and then use a fusion algorithm to join the clusters?
>
> Thanks
>
> Thorsten
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help