[BioC] ctc package - cluster dendrogram

Donna Toleno toleno at usc.edu
Mon Oct 15 23:54:53 CEST 2007


Hello list.

When I make an R Cluster Dendrogram, it looks very different from the clustering in the Newick file displayed in Treeview (Rod Page program) . I tried a simple example with 12 probes and 3 samples and I did the Euclidean distances manually and with R.


> library(ctc)
> data
         V1       V2       V3
1  4.184499 4.142575 4.017366
2  3.459849 3.455023 3.732115
3  8.287278 4.887692 5.007794
4  4.137224 4.523774 4.191996
5  4.431768 4.356945 4.570331
6  3.867442 3.931225 3.967566
7  3.480681 3.609997 3.522618
8  3.460785 3.966638 3.708675
9  4.306729 4.480724 4.399165
10 4.290001 4.036634 4.078688
11 6.707544 7.179901 9.475103
12 6.837264 6.845438 7.364477
> hc <- hcluster(t(data), link = "ave")
> write(hc2Newick(hc),file='hclust_12_probes_newick')
> plot (hc)
> hc

Call:
hcluster(x = t(data), link = "ave")

Cluster method   : average 
Distance         : euclidean 
Number of objects: 3 

'hclust_12_probes_newick' file contains:
(V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):0.752346233726435);

I can see that the above Newick formatted tree shows that sample 2 and sample 3 are the appropriate distance apart, about 2.4, but where does the 0.7523... come from? How do I interpret  "Height" on the y-axis of this dendrogram? I would like a tree that represents the expression difference. The Newick tree viewed in TreeView (Rod Page's Treeview)  looks different from the dendrogram produced by hcluster, but the branch lengths still do not reflect the Euclidean distances. In my example, the Newick tree shows all three samples about equidistant from each other.  Perhaps I should be using phylogenetic tree drawing to get the appropriate branch lengths from the Euclidean distances? I also experimented with hclust2treeview but this seems to refer to Michael Eisen's Treeview. I am not familiar with this program or the file formats it uses.

Thank you for reading. Any comments will be appreciated. 

Euclidean distance manually calculated in Excel for all of the 12 probes:

		V2	     	V3	
 V1   	3.508320996	4.352360295
 V2		   	             2.425648178

> distances.12.probes <- as.matrix(dist(t(data), method = "euclidean", diag = FALSE))
> distances.12.probes
         V1       V2       V3
V1 0.000000 3.508321 4.352360
V2 3.508321 0.000000 2.425648
V3 4.352360 2.425648 0.000000


Thank you again.

-Donna



More information about the Bioconductor mailing list