[R] Generating Hotelling's T squared statistic with hclust

David L Carlson dcarlson at tamu.edu
Fri Apr 8 20:13:09 CEST 2016


As Burt pointed out, your plan is not advisable (that is putting it diplomatically) and not about R, but we can use R to show you why it is not advisable. What you are doing is inherently circular. You use the data to create groups and then you test the groups against the data you used to create them. The null hypothesis in Hotelling's T is that the groups are completely independent of the data.

> set.seed(42)
> x <- matrix(rnorm(25*4), 25, 4)
> x.hcl <- hclust(dist(x), method="ward.D2")
> plot(x.hcl)

Now you have a dendrogram showing three nice looking clusters that are based on completely random numbers. Unless the pseudo random number function is flawed, there is no structure in these data, but the dendrogram looks plausible. We need 2 groups for Hotelling's T:

> grps <- cutree(x.hcl, 2)
> library(DescTools)
> HotellingsT2Test(x~grps)

        Hotelling's two sample T2-test

data:  x by grps
T.2 = 8.3476, df1 = 4, df2 = 20, p-value = 0.0003947
alternative hypothesis: true location difference is not equal to c(0,0,0,0)

No surprise. There is a significant difference between the groups. That just tells us the hclust() is working properly. It tells us exactly nothing about any structure or pattern in the data (there is none). An equally bad (but surprisingly common) approach is to use linear discriminant analysis. Here we will use 3 groups:

> grps <- cutree(x.hcl, 3)
> library(MASS)
> x.lda <- lda(x, grps)
> x.pre <- predict(x.lda)
> plot(x.lda)
> for (i in 1:3) { segments(centers[i, 2], centers[i, 3], 
+      x.pre$x[grps==i, 1], x.pre$x[grps==i, 2], lty=2)
+ }

Now we have 3 well-separated clusters created from completely random data. Hierarchical clustering always creates clusters. It does not question the data you provide and it does not stop and refuse to continue if there are no clusters in the data.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael
Sent: Friday, April 8, 2016 8:55 AM
To: r-help at r-project.org
Subject: [R] Generating Hotelling's T squared statistic with hclust

I am doing a cluster analysis with hclust.  I want to get hclust to output the Hotelling's T squared statistic for each cluster so I can evaluate is data points should be in a cluster or not.  My research to answer this question has been unsuccessful.  Does anyone know how to get hclust to output the Hotelling's T squared statistic for each cluster?


Mike



	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list