[R] Problems with hclust and/or cutree.
Rolf Turner
r.turner at auckland.ac.nz
Fri May 30 02:33:13 CEST 2008
I have been attempting to do some work using hclust, and have run
into a (possibly subtle) problem.
The background is that I constructed a dissimilarity matrix ``d1''
(it involved something called the ``Jaccard similarity coefficient'';
I won't go
into the details unless requested). I then did
d2 <- as.dist(d1)
try <- hclust(d2,method=ward)
plot(try,labels=FALSE)
After looking at the plot, I tried
mmm <- cutree(try,h=7)
and got the error message
Error in cutree(try, h = 7) :
the 'height' component of 'tree' is not sorted
(increasingly); consider applying as.hclust() first
I was much puzzled by this initially, since try is already an
``hclust'' object
(I checked class(try)) but after a substantial amount of hair-tearing
I discovered
that the entries of the height component of try are constant over
long stretches.
E.g. the first 54 entries are 0 (to the 7 printed decimal places).
This doesn't
*seem* to be cause for alarm --- the help says explicitly that height
is a
*non-decreasing* sequence (but not necessarily a strictly increasing
one).
I checked
with(try,all.equal(height,sort(height))
and got
[1] TRUE
but order(try$height) is NOT equal to 1:745 (note that 746 is the
number of subjects
in the data set).
I have done an RSiteSearch() on "cutree" and turned up nothing that
seemed relevant.
Finally, I found that if I do
try$height <- round(try$height,6)
then
mmm <- cutree(try,h=7)
``works'' (without error).
Are there traps for young players in employing such a strategy? What
should I
really worry about?
If anyone wants to try it for themselves with the real distance
matrix, I can bundle
it up and email it to them privately.
Thanks for any insights.
cheers,
Rolf Turner
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
More information about the R-help
mailing list