[R] Colouring hclust() trees
Richard A. O'Keefe
ok at cs.otago.ac.nz
Tue May 11 05:24:15 CEST 2004
I asked about putting some kind of coloured rug under a dendrogram.
Thomas Petzoldt <petzoldt at rcs.urz.tu-dresden.de> replied:
One possibility is to extract the coordinates used by the dendrogram
using par("usr") ...
Er, the documentation for par("usr") says
'usr' A vector of the form 'c(x1, x2, y1, y2)' giving the extremes
of the user coordinates of the plotting region. When a
logarithmic scale is in use (i.e., 'par("xlog")' is true, see
below), then the x-limits will be '10 ^ par("usr")[1:2]'.
Similarly for the y-axis.
But I _know_ the (logical) coordinates of the plotting region; what I need
is the coordinates of the leaves of the dendrogram.
but as a global alternative in cases like this (many cases and
known number of classes), I would suggest a different cluster
alorithm, e.g. ?kmeans.
That doesn't really help, amongst other things because kmeans is not
a hierarchical algorithm. I *DON'T* know the true number of classes.
I know how many classes the person who collected the data thinks there
are, and I don't need to do any clustering to find them, he gave me a
simple rule. What I want to know is how many clusters there OUGHT to be
and how similar these clusters are to the ones he thought there were.
>From poking around, the "right" number of clusters is somewhere between
2 and 6. (For the record, I _have_ tried kmeans and I've tabulated the
kmeans groups against the prespecified groups.)
If you want to get a visual idea you may try to apply an
ordination method (e.g. princomp or isoMDS the latter from
package MASS) and color the objects according to their class
found by kmeans.
I had already done that (using the prespecified classes, not classes found
by kmeans). But it didn't solve my present problem, which was overlaying
the *prespecified* classes onto a dendrogram.
Two other people gave me answers that are spot on.
Unfortunately, I've now lost their messages, so I can't name them.
Suggestion 1: use the RowSideColors (or ColSideColors) argument of heatmap().
This gives me two dendrograms (and I can suppress one if I want) and a heat
image of the data, and all things considered, it's *better* than what I wanted.
(I was aware of heatmap, but I'd failed to notice the relevance, or even the
existence, of the ???SideColors arguments.) In this particular case, the
graph _beautifully_ displays what I want it to display.
Suggestion 2: use the draw.clust function from the maptree packages.
I have now installed this package (which R makes *so* easy) and it does
exactly what I asked for.
Both of these approaches work with any dendrogram.
I'm beginning to suspect that if something isn't already available in R,
I'll never be able to imagine a need for it. But then I'm a bear of
very little brain...
More information about the R-help
mailing list