[R] Colouring hclust() trees

Richard A. O'Keefe ok at cs.otago.ac.nz
Mon May 10 05:29:26 CEST 2004


I have a data set with  6 variables and 251 cases.
The people who supplied me with this data set believe that it falls
naturally into three groups, and have given me a rule for determining
group number from these 6 variables.

If I do
    scaled.stuff <- scale(stuff, TRUE, c(...the design ranges...))
    stuff.dist <- dist(scaled.stuff)
    stuff.hc <- hclust(stuff.dist)
    plot(stuff.hc)
I get a dendrogram which looks sort of plausible, but

(a) with this many leaves, the leaf labels really aren't legible at any
    plausible scaling, and would be best omitted.  I could figure out
    which point was which if there were some way to use identify(), but
    I'm justnot seeing it.

(b) what I'd really like to do is to colour the leaves according to the
    predicted group, or some other variable.  The obvious thing to try is
    plot(stuff.hc, col=c("red","green","blue")[stuff.predicted.group])
    but that doesn't work.  I read everything that seemed plausible, and
    came across nodePar, but

    col <- c("red","green","blue")[stuff.predicted.group]
    plot(stuff.hc, nodePar=list(col=list("black",col)))

    tells me repeatedly that

    parameter "nodePar" couldn't be set in high-level plot() function 

    while 

    plot(as.dendrogram(hc), nodePar=list(col=list("black",col)))

    draws the dendrogram (_much_ slower than plot() does) and still gives
    me no colouring at all.  Clearly I have misunderstood how to use
    nodePar.

(c) The obvious fall-back is to use points() to draw the nodes again in
    the colours I want, but if I could do that, I could use identify().

The frustrating thing is that when I do

    d <- dim(stuff))[1]
    plot(1:d, 1:d, col=col[stuff.hc$order])

shows me that there _is_ a strong connection between the groups found by
hclust() and the predicted groups, albeit not a simple one.

I have looked at plot.dendrogram() and plotNode() -- using getAnywhere() --
and it looks to me as though what I want *should* be doable, but I've
clearly misunderstood the details of how to do it.




More information about the R-help mailing list