[R] cutree with agnes

Martin Maechler maechler at stat.math.ethz.ch
Mon Dec 15 15:14:00 CET 2003


       [diverted from R-help to R-devel; please follow up on R-devel!]

>>>>> "ChrisH" == Christian Hennig <fm3a004 at math.uni-hamburg.de>
>>>>>     on Thu, 11 Dec 2003 15:13:52 +0100 (MET) writes:

    ChrisH> Hi, this is rather a (presumed) bug report than a
    ChrisH> question because I can solve my personal statistical
    ChrisH> problem by working with hclust instead of agnes.

    ChrisH> I have done a complete linkage clustering on a dist
    ChrisH> object dm with 30 objects with agnes (R 1.8.0 on
    ChrisH> RedHat) and I want to obtain the partition that
    ChrisH> results from a cut at height=0.4.

    ChrisH> I run

    >> cl1a <- agnes(dm, method="complete");  cutree(cl1a,h=0.4)
    ChrisH>  [1] 1 2 3 4 5 6 3 7 3 8 9 10 3 11 12 13 14 15 3 16 17 3 18 19 20
    ChrisH> [26] 21 3 22 18 23

    ChrisH> But that's not true; correct is the solution
    ChrisH> obtained from hclust
    >> clx <- hclust(dm); cutree(clx,h=0.4)
    ChrisH>  [1] 1 2 1 2 3 4 1 2 1 3 4 5 1 4 6 7 8 4 1 5 2 1 9 2 2
    ChrisH> [26] 10 1 9 9 11

    ChrisH> as can be seen from the dendrogram plots of hclust
    ChrisH> *and* agnes.  (Note that the dendrograms of hclust
    ChrisH> and agnes are not identical due to the handling of
    ChrisH> ties in the distances, but the difference between
    ChrisH> the agnes and hclust dendrogram at h=0.4 concerns
    ChrisH> only two points.)  Specifying k instead of h in
    ChrisH> cutree for agnes seems to work properly, but that's
    ChrisH> not what I need in the general case.

If I lookup the help page for cutree, agnes and agnes.object,
nothing says that you can expect cutree to work with agnes
objects directly.  
On the contrary,  ?cutree  says about its first argument

    tree: a tree as produced by 'hclust'. 'cutree()' only expects a
          list with components 'merge', 'height', and 'labels', of
          appropriate content each.

and   ?agnes.object  mentions the   as.hclust() function that's
needed to produce an "hclust"-like object from the result of
agnes() {or diana()}.

Summarizing,
  1) You need      
     cutree(as.hclust(cl1a), h=0.4)

  2) cutree() shouldn't silently return a wrong result for agnes
     (or diana) objects.  Rather, it should return the proper thing
     or give an error.

Here I elaborate a bit on "2)" which is not entirely trivial --
hence the diversion to R-devel.
The best approach would be to make cutree() a generic function
with the `obvious' "hclust" & "twins" methods and a "default"
method which just uses something like NextMethod( as.hclust() ..).

However this breaks back-compatibility: cutree() may not work
anymore on user-constructed objects that are just list()s as
described for `tree' above.

We could alleviate this problem by try to make
as.hclust.default() much smarter, but I would tend to try not to
do it and let other people write their own as.hclust.* methods
for their own constructed objects.

Does this seem viable?  If I don't hear protest, I'll eventually
try to do this (in R-devel).

Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><




More information about the R-help mailing list