[R] Different cluster orderings from cutree() and cut.dendrogram()
Milan Bouchet-Valat
nalimilan at club.fr
Sun Aug 12 15:37:03 CEST 2012
Hi!
I just discovered that cutree() and cut.dendrogram() do not assign the
same cluster numberings when called on the same tree. More specifically,
cutree() assigns cluster numbers by order of appearance in the data,
while cut.dendrogram() sorts clusters by height (see example below). I
guess this is for historical reasons?
I'm hit by this difference when I want to get a vector of cluster
memberships after running a hierarchical clustering. One solution would
be to avoid mixing methods for dendrogram and hclust objects. But I
don't know an easy/clean way of getting the same information as cutree()
provides using dendrogram methods. Help is more than welcome!
I'd like to suggest a word about this discrepancy should be added
to ?cut.dendrogram and/or ?cutree. An example about how to get cluster
memberships using only dendrogram methods could also be useful.
Example based on ?hclust:
> hc <- hclust(dist(USArrests))
> table(cutree(hc, h=100))
1 2 3 4
14 14 20 2
> cut(as.dendrogram(hc), 100)$lower
[[1]]
'dendrogram' with 2 branches and 2 members total, at height 38.52791
[[2]]
'dendrogram' with 2 branches and 14 members total, at height 64.99362
[[3]]
'dendrogram' with 2 branches and 14 members total, at height 68.76227
[[4]]
'dendrogram' with 2 branches and 20 members total, at height 87.32634
(See how number of members by cluster differ in their ordering.)
Regards
More information about the R-help
mailing list