[R] cutree with agnes
Martin Maechler
maechler at stat.math.ethz.ch
Mon Dec 15 15:14:00 CET 2003
[diverted from R-help to R-devel; please follow up on R-devel!]
>>>>> "ChrisH" == Christian Hennig <fm3a004 at math.uni-hamburg.de>
>>>>> on Thu, 11 Dec 2003 15:13:52 +0100 (MET) writes:
ChrisH> Hi, this is rather a (presumed) bug report than a
ChrisH> question because I can solve my personal statistical
ChrisH> problem by working with hclust instead of agnes.
ChrisH> I have done a complete linkage clustering on a dist
ChrisH> object dm with 30 objects with agnes (R 1.8.0 on
ChrisH> RedHat) and I want to obtain the partition that
ChrisH> results from a cut at height=0.4.
ChrisH> I run
>> cl1a <- agnes(dm, method="complete"); cutree(cl1a,h=0.4)
ChrisH> [1] 1 2 3 4 5 6 3 7 3 8 9 10 3 11 12 13 14 15 3 16 17 3 18 19 20
ChrisH> [26] 21 3 22 18 23
ChrisH> But that's not true; correct is the solution
ChrisH> obtained from hclust
>> clx <- hclust(dm); cutree(clx,h=0.4)
ChrisH> [1] 1 2 1 2 3 4 1 2 1 3 4 5 1 4 6 7 8 4 1 5 2 1 9 2 2
ChrisH> [26] 10 1 9 9 11
ChrisH> as can be seen from the dendrogram plots of hclust
ChrisH> *and* agnes. (Note that the dendrograms of hclust
ChrisH> and agnes are not identical due to the handling of
ChrisH> ties in the distances, but the difference between
ChrisH> the agnes and hclust dendrogram at h=0.4 concerns
ChrisH> only two points.) Specifying k instead of h in
ChrisH> cutree for agnes seems to work properly, but that's
ChrisH> not what I need in the general case.
If I lookup the help page for cutree, agnes and agnes.object,
nothing says that you can expect cutree to work with agnes
objects directly.
On the contrary, ?cutree says about its first argument
tree: a tree as produced by 'hclust'. 'cutree()' only expects a
list with components 'merge', 'height', and 'labels', of
appropriate content each.
and ?agnes.object mentions the as.hclust() function that's
needed to produce an "hclust"-like object from the result of
agnes() {or diana()}.
Summarizing,
1) You need
cutree(as.hclust(cl1a), h=0.4)
2) cutree() shouldn't silently return a wrong result for agnes
(or diana) objects. Rather, it should return the proper thing
or give an error.
Here I elaborate a bit on "2)" which is not entirely trivial --
hence the diversion to R-devel.
The best approach would be to make cutree() a generic function
with the `obvious' "hclust" & "twins" methods and a "default"
method which just uses something like NextMethod( as.hclust() ..).
However this breaks back-compatibility: cutree() may not work
anymore on user-constructed objects that are just list()s as
described for `tree' above.
We could alleviate this problem by try to make
as.hclust.default() much smarter, but I would tend to try not to
do it and let other people write their own as.hclust.* methods
for their own constructed objects.
Does this seem viable? If I don't hear protest, I'll eventually
try to do this (in R-devel).
Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
More information about the R-help
mailing list