[R] statistical significance test for cluster agreement

Christian Hennig fm3a004 at math.uni-hamburg.de
Wed Mar 24 18:15:55 CET 2004


Dear Alexander,

On Wed, 24 Mar 2004, Alexander Sirotkin [at Yahoo] wrote:

> Like you said, such kind of test will not give me
> anything that Rand index does not, except for p-value.
> 
> The null hypothesis, in my case, is that clustering
> results does not match a different clustering, that
> someone alse did on the same data.

Usually, probability distributions (which you need to formulate null
hypotheses) are over data, not over different
methods applied to the same data. If you see two clusterings on the same
data, they are identical, if they are 100% identical, and if not, then
not. That's not a question of significance.

What you seem to want is the assessment of stability of a clustering on
given data by applying different cluster analyses, but this kind of
problem is not treated in terms of
"significance". Different cluster analyses do different things, and there
is no reason to expect that their results are the same apart from "random
variation" (the only exception is random variation in running the same
algorithm such as k-means from different random starting values - but
that's not a problem to investigate if you *know* the cluster
analysis method that produced your clustering).

Christian


***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de




More information about the R-help mailing list