[R] hclust, does order of data matter?

Christian Hennig chrish at stats.ucl.ac.uk
Mon Nov 15 23:16:24 CET 2010


I don't know how the hclust function is implemented, but generally in 
hierarchical clustering the result can be ambiguous if there are several 
distances of identical value in the dataset (or identical between-cluster 
distances occur when aggregating clusters). The role of the order of the 
data depends on how these ambiguities are resolved. It may well be that in 
such cases if at some point when building the hierarchy there are two 
different possibilities to merge clusters at the same distance value what 
is done by hclust is determined by the order.

Hope this helps,
Christian

On Mon, 15 Nov 2010, rchowdhury wrote:

>
> Hello,
>
> I am using the hclust function to cluster some data.  I have two separate
> files with the same data.  The only difference is the order of the data in
> the file.  For some reason, when I run the two files through the hclust
> function, I get two completely different results.
>
> Does anyone know why this is happening?  Does the order of the data matter?
>
> Thanks,
> RC
> --
> View this message in context: http://r.789695.n4.nabble.com/hclust-does-order-of-data-matter-tp3043896p3043896.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



More information about the R-help mailing list