[R] Can I compare two clusters without using their distance-matrix (dist()) ?

Christian Hennig chrish at stats.ucl.ac.uk
Wed Apr 21 19:16:58 CEST 2010


Dear Tal,

I took the definition of the Hubert gamma- and Dunn-index from the Gordon 
book. They are actually not about comparing two clusters, at least not in 
that reference, and they require dissimilarities.

The adjusted Rand index and Meila's VI, as implemented in 
cluster.stats, compare two clusterings. If you set compareonly=TRUE in 
cluster.stats, it only computes these two indexes, so it doesn't need the 
dissimilarity matrix in principle. I will probably in the next update
change it so that in this case you don't need to provide a
dissimilarity matrix.

Until then, you can supply a noninformative matrix.
Example:
c1 <- sample(4,100,replace=TRUE)
c2 <- sample(5,100,replace=TRUE)
cs <- cluster.stats(d=matrix(0,ncol=100,nrow=100),c1,c2,compareonly=TRUE)

cs$corrected.rand
cs$vi

Hope this helps,
Christian



On Wed, 21 Apr 2010, Tal Galili wrote:

> Thanks for the fast reply Uwe.
>
> My hope in posting this was to find if anyone had already done work (in R)
> in this direction.  So far I wasn't able to find any such relevant code, so
> I turned to the mailing list.
>
> Regarding new implementations - thanks for offering! - I have already came
> around one such algorithm - I implemented it, and will probably publish it
> on my blog <http://www.r-statistics.com/> in the near future.
>
> If any one else has any reference to R implementation, it would be most
> helpful,
> Tal
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ----------------------------------------------------------------------------------------------
>
>
>
>
> 2010/4/21 Uwe Ligges <ligges at statistik.tu-dortmund.de>
>
>> On 21.04.2010 18:15, Tal Galili wrote:
>>
>>> Hello all,
>>>
>>> I would like to compare the similarity of two cluster solutions using a
>>> validation criteria (such as Hubert's gamma coefficient, the Dunn index
>>> the
>>> corrected rand index and so on)
>>>
>>> I see (from here:http://www.statmethods.net/advstats/cluster.html) that
>>> the function cluster.stats() in the fpc package provides a mechanism
>>> for comparing 2 cluster solutions - *BUT* - it requires me to give the
>>> the distance matrix among objects.
>>>
>>> *My question *is: What ways can you suggest for comparing two cluster
>>> solutions, while using the cluster indicators only (i.e: a vector saying
>>> to
>>> which cluster each object belongs to), and WITHOUT asking to submit the
>>> distance matrix between the objects.
>>>
>>
>> Don't know. If you have a theoretical solution and can provide the
>> description of a method, there will be many people around happy to make an
>> algorithm and implement it.
>>
>> Uwe Ligges
>>
>>
>>
>>  Thanks,
>>> Tal
>>>
>>>
>>>
>>> ----------------Contact
>>> Details:-------------------------------------------------------
>>> Contact me: Tal.Galili at gmail.com |  972-52-7275845
>>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
>>> www.r-statistics.com (English)
>>>
>>> ----------------------------------------------------------------------------------------------
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



More information about the R-help mailing list