[R] hclust with method = “ward”

Christian Hennig chrish at stats.ucl.ac.uk
Thu Oct 7 11:08:11 CEST 2010


On Wed, 6 Oct 2010, PeterB wrote:

> Thanks, Christian. This is really helpful.
>
> I was not aware of that equality, but now I can see it. I think you mean the
> inner sum over all distances in the distance matrix (for that cluster),
> which means that each distance is counted twice (which is why we divide by
> 2).

That's probably how to explain it... you can obviuously check it by 
writing the whole thing down, which is how I did it. (The formula is in 
Bock's old book on "Automatische Klassifikation", but that's in German.)

Christian

>
> Peter
>
>
> Christian Hennig wrote:
>>
>> The k-means/Ward criterion can be written down in terms of squared
>> Euclidean distances in a way that doesn't involve means. It is half the
>> sum (over all clusters) of the sum (over all observations in a
>> cluster) of all within-cluster squared dissimilarities, the inner sum
>> divided by the cluster size. This can also be computed for a general
>> dissimilarity matrix (this is for example done by cluster.stats in
>> package fpc).
>>
>> I'd guess that hclust with method="ward" uses this when run with a general
>> dissimilarity matrix. At least it would make sense, although I'm not sure
>> whether it really is what hclust does, because I didn't check the
>> underlying Fortran code.
>>
>> Note that I may have missed postings in this thread, so sorry if this
>> doesn't add to what you already have worked out.
>>
>> Christian
>>
>> On Wed, 6 Oct 2010, PeterB wrote:
>>
>>>
>>> Apparently, the same issue exists in SAS, where there is an option to run
>>> the
>>> Ward algorithm based only on the distance matrix. Perhaps, a SAS user
>>> could
>>> confirm that or even check with SAS.
>>>
>>> Peter
>>>
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140p2965310.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> *** --- ***
>> Christian Hennig
>> University College London, Department of Statistical Science
>> Gower St., London WC1E 6BT, phone +44 207 679 1698
>> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> --
> View this message in context: http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140p2966045.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



More information about the R-help mailing list