[R] dendrogram rect.hclust() not working?
Alex Reynolds
reynolda at u.washington.edu
Fri Apr 3 20:51:14 CEST 2009
I have tried to use rect.hclust() to draw a rectangle around a set of
leaves, but am running into trouble.
The rect.hclust() is drawing two rects instead of one, and of the
wrong size:
--------------------
scoreClusterObj <- hclust(scoreDistanceObj, method=clustMethod)
order <- scoreClusterObj$order
orderedLabels <- rep(0, length(order))
for (orderIndex in 1:length(order)) {
# this puts a "name" to the permutation of leaves, done by hclust()
orderedLabels[orderIndex] <- classes[order[orderIndex]]
}
scoreDendrogramObj <- as.dendrogram(scoreClusterObj)
coloredLeafScoreDendrogramObj <- dendrapply(scoreDendrogramObj,
markColoredLeaves)
scoreDendrogramPlot <- plot(coloredLeafScoreDendrogramObj,
horiz=FALSE, axes=FALSE)
significantClustersInScoreDendrogramObj <-
dendrapply(coloredLeafScoreDendrogramObj, markSignificantClusters)
--------------------
I have the local functions markColoredLeaves() -- which changes the
colors of certain leaves, and it works fine -- and another function
called markSignificantClusters(), in which I try to draw a
rect.hclust() if a condition is met (i.e. a cluster is statistically
significant):
markSignificantClusters <<- function (n) {
if (!is.leaf(n)) {
a <- attributes(n)
leafList <- unlist(dendrapply(n, listLabels))
if (nodesContainCertainLeaves) {
ma <- match(leafList, orderedLabels)
print (paste ("min-ma", min(ma), "max-ma", max(ma), sep=" "))
r <- rect.hclust(scoreClusterObj, h = a$height, which = c(min(ma),
max(ma)), border = 2)
print (r)
quit()
}
}
}
For testing, I have a call to quit() the script after the first
qualifying node has a rect drawn around it.
So I run this script, and when I look at the runtime log output (from
the print() statements), it finds the correct, qualifying node
containing the following two items:
[1] "clusters"
[1] "+v_stat3_01" "+v_stat1_01"
These two leaves are located at positions 5 and 6 of the tree. This is
correct output from the statistical test. So I should only get one
rect drawn, of width 2, containing leaves 5 and 6.
Also, the "ma" variable is returning the correct leaf range (between 5
and 6, inclusively), so I know I'm passing the correct leaf range to
the rect.hclust() function.
But in my graphical output, I get two rects at positions 6 and 7:
http://www.flickr.com/photos/alexreynolds/3409263765/sizes/o/
This doesn't seem to be an offset issue, for two reasons:
1. I am getting two rects, not one. Looking at the output from the
print(r) statement, I see why two rects are drawn:
[[1]]
+v_stat1_01
273
[[2]]
+v_e2f1_q6_01
326
2. When I re-run the script, I can occasionally get different cluster
results. In one case, where I should get one rect of size 3
(containing 3 leaves), instead I get two rects containing 2 and 4
leaves, resp. separated by several other clusters.
Worse, if I plot the dendrogram horizontally, the rects are drawn of
completely wrong dimensions:
scoreDendrogramPlot <- plot(coloredLeafScoreDendrogramObj, horiz=TRUE,
axes=FALSE)
yields:
http://www.flickr.com/photos/alexreynolds/3410126060/sizes/o/
Is there a way to use rect.hclust() that works reliably (in both
orientations, or even in one orientation)?
Alternatively, is there a way to modify the thickness and color of the
edges that draw down from a "significant" node?
I tried adding this to my markSignificantClusters() function, within
the "if (nodesContainCertainLeaves)" block, to modify the edgePar
settings of a qualifying node, to no effect:
n <- dendrapply(n, function(e) { attr(e, "edgePar") <- list(lty=3,
col="red"); e })
If you got this far through this message, thanks. :) I would be
grateful for any advice.
Thanks,
Alex
More information about the R-help
mailing list