[R] dendrogram rect.hclust() not working?

Alex Reynolds reynolda at u.washington.edu
Fri Apr 3 20:51:14 CEST 2009


I have tried to use rect.hclust() to draw a rectangle around a set of  
leaves, but am running into trouble.

The rect.hclust() is drawing two rects instead of one, and of the  
wrong size:

--------------------
scoreClusterObj <- hclust(scoreDistanceObj, method=clustMethod)
order <- scoreClusterObj$order
orderedLabels <- rep(0, length(order))
for (orderIndex in 1:length(order)) {
	# this puts a "name" to the permutation of leaves, done by hclust()
	orderedLabels[orderIndex] <- classes[order[orderIndex]]
}
scoreDendrogramObj <- as.dendrogram(scoreClusterObj)
coloredLeafScoreDendrogramObj <- dendrapply(scoreDendrogramObj,  
markColoredLeaves)
scoreDendrogramPlot <- plot(coloredLeafScoreDendrogramObj,  
horiz=FALSE, axes=FALSE)
significantClustersInScoreDendrogramObj <-  
dendrapply(coloredLeafScoreDendrogramObj, markSignificantClusters)
--------------------

I have the local functions markColoredLeaves() -- which changes the  
colors of certain leaves, and it works fine -- and another function  
called markSignificantClusters(), in which I try to draw a  
rect.hclust() if a condition is met (i.e. a cluster is statistically  
significant):

markSignificantClusters <<- function (n) {
	if (!is.leaf(n)) {
		a <- attributes(n)
		leafList <- unlist(dendrapply(n, listLabels))
		if (nodesContainCertainLeaves) {
			ma <- match(leafList, orderedLabels)
			print (paste ("min-ma", min(ma), "max-ma", max(ma), sep=" "))
			r <- rect.hclust(scoreClusterObj, h = a$height, which = c(min(ma),  
max(ma)), border = 2)
			print (r)
			quit()
		}
	}
}

For testing, I have a call to quit() the script after the first  
qualifying node has a rect drawn around it.

So I run this script, and when I look at the runtime log output (from  
the print() statements), it finds the correct, qualifying node  
containing the following two items:

[1] "clusters"
[1] "+v_stat3_01" "+v_stat1_01"

These two leaves are located at positions 5 and 6 of the tree. This is  
correct output from the statistical test. So I should only get one  
rect drawn, of width 2, containing leaves 5 and 6.

Also, the "ma" variable is returning the correct leaf range (between 5  
and 6, inclusively), so I know I'm passing the correct leaf range to  
the rect.hclust() function.

But in my graphical output, I get two rects at positions 6 and 7:

http://www.flickr.com/photos/alexreynolds/3409263765/sizes/o/

This doesn't seem to be an offset issue, for two reasons:

1. I am getting two rects, not one. Looking at the output from the  
print(r) statement, I see why two rects are drawn:

[[1]]
+v_stat1_01
         273

[[2]]
+v_e2f1_q6_01
           326

2. When I re-run the script, I can occasionally get different cluster  
results. In one case, where I should get one rect of size 3  
(containing 3 leaves), instead I get two rects containing 2 and 4  
leaves, resp. separated by several other clusters.

Worse, if I plot the dendrogram horizontally, the rects are drawn of  
completely wrong dimensions:

scoreDendrogramPlot <- plot(coloredLeafScoreDendrogramObj, horiz=TRUE,  
axes=FALSE)

yields:

http://www.flickr.com/photos/alexreynolds/3410126060/sizes/o/

Is there a way to use rect.hclust() that works reliably (in both  
orientations, or even in one orientation)?

Alternatively, is there a way to modify the thickness and color of the  
edges that draw down from a "significant" node?

I tried adding this to my markSignificantClusters() function, within  
the "if (nodesContainCertainLeaves)" block, to modify the edgePar  
settings of a qualifying node, to no effect:

n <- dendrapply(n, function(e) { attr(e, "edgePar") <- list(lty=3,  
col="red"); e })

If you got this far through this message, thanks. :) I would be  
grateful for any advice.

Thanks,
Alex




More information about the R-help mailing list