[BioC] ctc package - cluster dendrogram
Donna Toleno
toleno at usc.edu
Wed Oct 17 18:53:45 CEST 2007
> Hi!
>
> If you draw a dendrogram in R, the y-axis is the distance between
> objects.
> In your case, the tree looks roughly like:
>
> 4 +--+
> | | |
> 3 1 |
> | +-+
> 2 | |
> 2 3
>
> As the branch which connects V2 and V3 is at approx. 2.4 it is the
> distance between these objects (samples). The same applies to the
> distance between samples V1 and V3 (or V2 and V3). Those connect at
> approx. 3.9, and that is the distance between these samples. You
> can plot
> the tree using
>
> plot(hc, hang=0)
>
> and this should become more evident.
>
> This is contrast to Treeview that visualizes the distances as
> branch
> lengths. If you visualize the tree in Treeview (by Rod Page), the
> branches
> are the euclidean distances between the samples, and are not
> equidistant.
> For example, the distance between V2 and V3 is approx. 2.4. In the
> tree
> drawn by Treeview, the branch lengths are half of that, so each
> terminal
> branch leading to either V2 or V3 is about 1.2.
>
> You also asked about the 0.752... distances in the tree:
>
> > 'hclust_12_probes_newick' file contains:
> > (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):
> > 0.752346233726435);
>
> The first is the lenght of branch leading to V1, another is the
> length of
> the only internal branch of the tree. Those are computed from the
> pairwise
> distances between samples using the average linkage (UPGMA) algorithm.
>
> - Jarno
>
> >
> >> library(ctc)
> >> data
> > V1 V2 V3
> > 1 4.184499 4.142575 4.017366
> > 2 3.459849 3.455023 3.732115
> > 3 8.287278 4.887692 5.007794
> > 4 4.137224 4.523774 4.191996
> > 5 4.431768 4.356945 4.570331
> > 6 3.867442 3.931225 3.967566
> > 7 3.480681 3.609997 3.522618
> > 8 3.460785 3.966638 3.708675
> > 9 4.306729 4.480724 4.399165
> > 10 4.290001 4.036634 4.078688
> > 11 6.707544 7.179901 9.475103
> > 12 6.837264 6.845438 7.364477
> >> hc <- hcluster(t(data), link = "ave")
> >> write(hc2Newick(hc),file='hclust_12_probes_newick')
> >> plot (hc)
> >> hc
> >
> > Call:
> > hcluster(x = t(data), link = "ave")
> >
> > Cluster method : average
> > Distance : euclidean
> > Number of objects: 3
> >
> > 'hclust_12_probes_newick' file contains:
> >
> (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):0.752346233726435);>
> > I can see that the above Newick formatted tree shows that sample
> 2 and sample 3 are the appropriate distance apart, about 2.4, but
> where does the 0.7523... come from? How do I interpret "Height" on
> the y-axis of this dendrogram? .....
> >
> > Euclidean distance manually calculated in Excel for all of the 12
> probes:>
> > V2 V3
> > V1 3.508320996 4.352360295
> > V2 2.425648178
> >
> >> distances.12.probes <- as.matrix(dist(t(data), method =
> "euclidean", diag = FALSE))
> >> distances.12.probes
> > V1 V2 V3
> > V1 0.000000 3.508321 4.352360
> > V2 3.508321 0.000000 2.425648
> > V3 4.352360 2.425648 0.000000
> >
> >
> > Thank you again.
> >
> > -Donna
Thank you.
I understand it completely now.
(4.352360 + 3.508)/2 = 3.9 = average distance from 1 to 2 and from 1 to 3.
Then 3.9 - 2.4 = 1.5
1.5 /2 = 0.75 for the internal branch and the branch for V1.
More information about the Bioconductor
mailing list