[R] extracting groups from hclust() for a very large matrix

Milan Bouchet-Valat nalimilan at club.fr
Fri Oct 12 10:06:06 CEST 2012

```Le jeudi 11 octobre 2012 à 15:50 -0700, Christopher R. Dolanc a écrit :
> Hello,
>
> I'm having trouble figuring out how to see resulting groups (clusters)
> from my hclust() output. I have a very large matrix of 4371 plots and 29
> species, so simply looking at the graph is impossible. There must be a
> way to 'print' the results to a table that shows which plots were in
> what group, correct?
>
> I've attached the matrix I'm working with (the whole thing since the
> point is its large size).
I can't see it (probably removed by the server). Anyways, you should be
able to reproduce the same thing with a small reproducible example: I
don't see anything related to a large matrix below, apart maybe the
vegemite() error.

> I've been able to run the following code to
> get the groups I need:
>
>  > VTM.Dist<- vegdist(VTM.Matrix)
>  > VTM.HClust<- hclust(VTM.Dist, method="ward")
>  > plot(VTM.HClust, hang=-1)
>
> It takes a while, but it does run. Then, I can extract 8 groups, which
> I'd like to experiment with, but is about how many I'd like:
>
> rect.hclust(VTM.HClust, 8)
>  > VTM.8groups<- cutree(VTM.HClust, 8)
>
> But, instead of listing the plots by name, it only tells me *how many*
> plots are in the eight groups:
>
>  > table(VTM.8groups)
> VTM.8groups
>     1    2    3    4    5    6    7    8
>   137  173  239  356  709  585  908 1264
Just remove the call to table(). This function is precisely made to tell
you how many times each value (here group) is present. If you want the
list of plots and their groups, it's here:
VTM.8groups

> The vegemite() function also doesn't work for this reason - I have way
> too many plots so they number in the thousands, which vegemite doesn't like.
>
>  > vegemite(VTM.Matrix, VTM.HClust)
> Error in vegemite(VTM.Matrix, VTM.HClust) :
>    Cowardly refusing to use longer than 1 char symbols:
> Use scale
>
> Does anybody know how I can get a simple list of plots in each category?
> I would think this would be something like a summary command. Perhaps a
> different clustering method?
>
> Thanks,
> Chris Dolanc
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help