[R] Manually modifying an hclust dendrogram to remove singletons

ilai keren at math.montana.edu
Fri May 25 05:26:31 CEST 2012


Can't put my finger on it but something about your idea rubs me the
wrong way. Maybe it's that the tree depends on the hierarchical
clustering algorithm and the choice on how to trim it should be based
on something more defensible than "avoid singletons". In this example
Hawaii is really different than New Hampshire, why would you want them
clustered together ?

But, it's your work, field of study, whatever. If you are going to do
it anyway, one way would be to loop over cut heights:

 hc <- hclust(dist(USArrests), "ave")
 plot(hc)
 hr <- range(hc$height)
 tol<- diff(hr)/100    # set tolerance level
 for(i in seq(1e-4+hr[1],hr[2],tol)){
 hcc <- rect.hclust(hc,h=i)
 if(all(sapply(hcc,length)>1)) break
 }
 str(hcc)

# or if you prefer dendrogram
 dend1<- as.dendrogram(hc)
 for(i in seq(1e-4+hr[1],hr[2],tol)){
 dend2 <- cut(dend1,h=i)
 if(all(sapply(dend2$lower,function(x) attr(x,'members'))>1)) break
 }
 dend2

Cheers

On Thu, May 24, 2012 at 10:31 AM,  <r-help.20.trevva at spamgourmet.com> wrote:
> Dear R-Help,
>
> I have a clustering problem with hclust that I hope someone can help
> me with. Consider the classic hclust example:
>
>     hc <- hclust(dist(USArrests), "ave")
>     plot(hc)
>
> I would like to cut the tree up in such a way so as to avoid small
> clusters, so that we get a minimum number of items in each cluster,
> and therefore avoid singletons. e.g. in this example, you can see that
> Hawaii is split off onto its own at quite a high level. I would like
> to avoid having a single item clustered on its own like this. How can
> I achieve this?
>
> I have tried manually modifying the tree using dendrapply but have not
> been able to produce a valid solution thus far..
>
> Suggestions are welcome.
>
> Best wishes,
>
> Mark
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list