[BioC] ctc package - cluster dendrogram

Donna Toleno toleno at usc.edu
Wed Oct 17 18:53:45 CEST 2007


> Hi!
> 
> If you draw a dendrogram in R, the y-axis is the distance between 
> objects. 
> In your case, the tree looks roughly like:
> 
> 4  +--+
> |  |  |
> 3  1  |
> |    +-+
> 2    | |
>      2 3
> 
> As the branch which connects V2 and V3 is at approx. 2.4 it is the 
> distance between these objects (samples). The same applies to the 
> distance between samples V1 and V3 (or V2 and V3). Those connect at 
> approx. 3.9, and that is the distance between these samples. You 
> can plot 
> the tree using
> 
> plot(hc, hang=0)
> 
> and this should become more evident.
> 
> This is contrast to Treeview that visualizes the distances as 
> branch 
> lengths. If you visualize the tree in Treeview (by Rod Page), the 
> branches 
> are the euclidean distances between the samples, and are not 
> equidistant. 
> For example, the distance between V2 and V3 is approx. 2.4. In the 
> tree 
> drawn by Treeview, the branch lengths are half of that, so each 
> terminal 
> branch leading to either V2 or V3 is about 1.2.
> 
> You also asked about the 0.752... distances in the tree:
> 
> > 'hclust_12_probes_newick' file contains:
> > (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):
> > 0.752346233726435);
> 
> The first is the lenght of branch leading to V1, another is the 
> length of 
> the only internal branch of the tree. Those are computed from the 
> pairwise 
> distances between samples using the average linkage (UPGMA) algorithm.
> 
> - Jarno
> 

> >
> >> library(ctc)
> >> data
> >         V1       V2       V3
> > 1  4.184499 4.142575 4.017366
> > 2  3.459849 3.455023 3.732115
> > 3  8.287278 4.887692 5.007794
> > 4  4.137224 4.523774 4.191996
> > 5  4.431768 4.356945 4.570331
> > 6  3.867442 3.931225 3.967566
> > 7  3.480681 3.609997 3.522618
> > 8  3.460785 3.966638 3.708675
> > 9  4.306729 4.480724 4.399165
> > 10 4.290001 4.036634 4.078688
> > 11 6.707544 7.179901 9.475103
> > 12 6.837264 6.845438 7.364477
> >> hc <- hcluster(t(data), link = "ave")
> >> write(hc2Newick(hc),file='hclust_12_probes_newick')
> >> plot (hc)
> >> hc
> >
> > Call:
> > hcluster(x = t(data), link = "ave")
> >
> > Cluster method   : average
> > Distance         : euclidean
> > Number of objects: 3
> >
> > 'hclust_12_probes_newick' file contains:
> > 
> (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):0.752346233726435);>
> > I can see that the above Newick formatted tree shows that sample 
> 2 and sample 3 are the appropriate distance apart, about 2.4, but 
> where does the 0.7523... come from? How do I interpret  "Height" on 
> the y-axis of this dendrogram? .....
> >
> > Euclidean distance manually calculated in Excel for all of the 12 
> probes:>
> >         	V2             	V3
> > V1   	3.508320996	4.352360295
> > V2                                        2.425648178
> >
> >> distances.12.probes <- as.matrix(dist(t(data), method = 
> "euclidean", diag = FALSE))
> >> distances.12.probes
> >         V1       V2       V3
> > V1 0.000000 3.508321 4.352360
> > V2 3.508321 0.000000 2.425648
> > V3 4.352360 2.425648 0.000000
> >
> >
> > Thank you again.
> >
> > -Donna


Thank you.

I understand it completely now.  

(4.352360 + 3.508)/2 = 3.9   = average distance from 1 to 2 and from 1 to 3.

Then 3.9 - 2.4 = 1.5

1.5 /2 = 0.75 for the internal branch and the branch for V1.



More information about the Bioconductor mailing list