[R] Colouring hclust() trees
Richard A. O'Keefe
ok at cs.otago.ac.nz
Mon May 10 05:29:26 CEST 2004
I have a data set with 6 variables and 251 cases.
The people who supplied me with this data set believe that it falls
naturally into three groups, and have given me a rule for determining
group number from these 6 variables.
If I do
scaled.stuff <- scale(stuff, TRUE, c(...the design ranges...))
stuff.dist <- dist(scaled.stuff)
stuff.hc <- hclust(stuff.dist)
I get a dendrogram which looks sort of plausible, but
(a) with this many leaves, the leaf labels really aren't legible at any
plausible scaling, and would be best omitted. I could figure out
which point was which if there were some way to use identify(), but
I'm justnot seeing it.
(b) what I'd really like to do is to colour the leaves according to the
predicted group, or some other variable. The obvious thing to try is
but that doesn't work. I read everything that seemed plausible, and
came across nodePar, but
col <- c("red","green","blue")[stuff.predicted.group]
tells me repeatedly that
parameter "nodePar" couldn't be set in high-level plot() function
draws the dendrogram (_much_ slower than plot() does) and still gives
me no colouring at all. Clearly I have misunderstood how to use
(c) The obvious fall-back is to use points() to draw the nodes again in
the colours I want, but if I could do that, I could use identify().
The frustrating thing is that when I do
d <- dim(stuff))
plot(1:d, 1:d, col=col[stuff.hc$order])
shows me that there _is_ a strong connection between the groups found by
hclust() and the predicted groups, albeit not a simple one.
I have looked at plot.dendrogram() and plotNode() -- using getAnywhere() --
and it looks to me as though what I want *should* be doable, but I've
clearly misunderstood the details of how to do it.
More information about the R-help