[R] hclust with method = “ward”
Christian Hennig
chrish at stats.ucl.ac.uk
Wed Oct 6 18:48:12 CEST 2010
The k-means/Ward criterion can be written down in terms of squared
Euclidean distances in a way that doesn't involve means. It is half the
sum (over all clusters) of the sum (over all observations in a
cluster) of all within-cluster squared dissimilarities, the inner sum
divided by the cluster size. This can also be computed for a general
dissimilarity matrix (this is for example done by cluster.stats in
package fpc).
I'd guess that hclust with method="ward" uses this when run with a general
dissimilarity matrix. At least it would make sense, although I'm not sure
whether it really is what hclust does, because I didn't check the
underlying Fortran code.
Note that I may have missed postings in this thread, so sorry if this
doesn't add to what you already have worked out.
Christian
On Wed, 6 Oct 2010, PeterB wrote:
>
> Apparently, the same issue exists in SAS, where there is an option to run the
> Ward algorithm based only on the distance matrix. Perhaps, a SAS user could
> confirm that or even check with SAS.
>
> Peter
>
> --
> View this message in context: http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140p2965310.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
More information about the R-help
mailing list