[R] Any functions to manipulate (merge, cut, remove) hclust objects? (maybe through phylo?)

Martin Maechler maechler at stat.math.ethz.ch
Wed Dec 29 15:01:37 CET 2010

```>>>>> Tal Galili <tal.galili at gmail.com>
>>>>>     on Wed, 29 Dec 2010 14:08:26 +0200 writes:

> Hello Martin,
> Thank you for the reference to the "cut" option in the dendrogram help page!
> I guess I was too focused on looking for a solution to the hclust object
> then to think that such a method existed for dendrograms.

> The cut.dendrogram  doesn't solve my problem yet, since what I'm looking for
> is the output of something like:
> cutree(hc.object, k = 3)

> which is a vector indicating to which cluster belongs each item.

indeed; and that's only indirectly the result of  a cut(*, h= .)
call.

BTW: cutree() internally translates a
'h = *' specification into a  'k = *' one.....
...
...
which is actually a bit peculiar, as a cut at a given height is well-defined,
but a cut into a given number of clusters may *NOT* be well
defined in the case where two sub branches have the exact same
height 'h'; such that going from  h  to  'h - eps'  leads to
addition of *two* new clusters, i.e., a step  k --> k+2
such that cutree(*, k+1) is not really well defined.
The cutree() internal algorithm will use the (somewhat)
arbitrary order of the merges to define the grouping.

Given all the above, I now tend to think that yes, indeed,
it may be most fruitful to provide
a  as.hclust.dendrogram() method, rather than just implementing
a cut() - based cutree method for dendrograms.

> And for some reason I can't seem to understand the structure of the
> dendrogram object using "str".

Yes;  there's a str.dendrogram() method which very nicely
shows the structure of a dendrogram,
however, if you really want to see the internal structure, you need
str(unclass( . ))

> But I'll read some more and write back if I can't solve it.

> p.s: If I'll succeed in writing something useful, it will be
> my pleasure and honor to contribute it back to the r-project :)

Cool.
Actually, now I think the merge() is the much easier part than
the cutree() / as.hclust.dendrogram() one.
But also that should not be very hard.

As I'm officially in vacation at the moment, I may have some fun
helping with these...

Martin

> On Wed, Dec 29, 2010 at 1:49 PM, Martin Maechler <maechler at stat.math.ethz.ch
>> wrote:

>> >>>>> Tal Galili <tal.galili at gmail.com>
>> >>>>>     on Wed, 29 Dec 2010 13:32:26 +0200 writes:
>>
>> > Hello Martin,
>> > Thank you for replying.
>>
>> > I have two needs:
>>
>> > 1) To merge two dendrograms into one.
>>
>> > 2) To then run cutree on it (which works on hclust, but
>> >    not on dendrogram).
>>
>> Well, but cut() does and is prominently mentioned on the
>> dendrogram help page (and its examples)
>>
>> > I guess that if I knew how to perform both steps I would be able to do
>> what
>> > I'm trying to do on my data.
>> > If nothing like this currently exists, I guess I'll simply implement a
>> > method of cutree for a dendrogram, and see how to merge two
>> dendrograms
>> > together.
>>
>> so you only need to program the merge / join part.
>>
>> I did not take the time to understand what exactly you mean with
>> that, but as there is no function to do that with "hclust" either,
>> I'm convinced you should rather write one for "dendrogram"
>> indeed; as merge() is already "S3 generic", I'd call it
>> merge.dendrogram()
>>
>> If you end up finding it useful and are willing to write a help
>> page (including examples!) for it, you may consider donating it
>> back to the R-project ... ;-)
>>
>> Regards, Martin
>>

```