[BioC] ctc package - cluster dendrogram
Jarno Tuimala
jtuimala at csc.fi
Wed Oct 17 07:20:49 CEST 2007
Hi!
If you draw a dendrogram in R, the y-axis is the distance between objects.
In your case, the tree looks roughly like:
4 +--+
| | |
3 1 |
| +-+
2 | |
2 3
As the branch which connects V2 and V3 is at approx. 2.4 it is the
distance between these objects (samples). The same applies to the
distance between samples V1 and V3 (or V2 and V3). Those connect at
approx. 3.9, and that is the distance between these samples. You can plot
the tree using
plot(hc, hang=0)
and this should become more evident.
This is contrast to Treeview that visualizes the distances as branch
lengths. If you visualize the tree in Treeview (by Rod Page), the branches
are the euclidean distances between the samples, and are not equidistant.
For example, the distance between V2 and V3 is approx. 2.4. In the tree
drawn by Treeview, the branch lengths are half of that, so each terminal
branch leading to either V2 or V3 is about 1.2.
You also asked about the 0.752... distances in the tree:
> 'hclust_12_probes_newick' file contains:
> (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):
> 0.752346233726435);
The first is the lenght of branch leading to V1, another is the length of
the only internal branch of the tree. Those are computed from the pairwise
distances between samples using the average linkage (UPGMA) algorithm.
- Jarno
On Mon, 15 Oct 2007, Donna Toleno wrote:
> Hello list.
>
> When I make an R Cluster Dendrogram, it looks very different from the clustering in the Newick file displayed in Treeview (Rod Page program) . I tried a simple example with 12 probes and 3 samples and I did the Euclidean distances manually and with R.
>
>
>> library(ctc)
>> data
> V1 V2 V3
> 1 4.184499 4.142575 4.017366
> 2 3.459849 3.455023 3.732115
> 3 8.287278 4.887692 5.007794
> 4 4.137224 4.523774 4.191996
> 5 4.431768 4.356945 4.570331
> 6 3.867442 3.931225 3.967566
> 7 3.480681 3.609997 3.522618
> 8 3.460785 3.966638 3.708675
> 9 4.306729 4.480724 4.399165
> 10 4.290001 4.036634 4.078688
> 11 6.707544 7.179901 9.475103
> 12 6.837264 6.845438 7.364477
>> hc <- hcluster(t(data), link = "ave")
>> write(hc2Newick(hc),file='hclust_12_probes_newick')
>> plot (hc)
>> hc
>
> Call:
> hcluster(x = t(data), link = "ave")
>
> Cluster method : average
> Distance : euclidean
> Number of objects: 3
>
> 'hclust_12_probes_newick' file contains:
> (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):0.752346233726435);
>
> I can see that the above Newick formatted tree shows that sample 2 and sample 3 are the appropriate distance apart, about 2.4, but where does the 0.7523... come from? How do I interpret "Height" on the y-axis of this dendrogram? I would like a tree that represents the expression difference. The Newick tree viewed in TreeView (Rod Page's Treeview) looks different from the dendrogram produced by hcluster, but the branch lengths still do not reflect the Euclidean distances. In my example, the Newick tree shows all three samples about equidistant from each other. Perhaps I should be using phylogenetic tree drawing to get the appropriate branch lengths from the Euclidean distances? I also experimented with hclust2treeview but this seems to refer to Michael Eisen's Treeview. I am not familiar with this program or the file formats it uses.
>
> Thank you for reading. Any comments will be appreciated.
>
> Euclidean distance manually calculated in Excel for all of the 12 probes:
>
> V2 V3
> V1 3.508320996 4.352360295
> V2 2.425648178
>
>> distances.12.probes <- as.matrix(dist(t(data), method = "euclidean", diag = FALSE))
>> distances.12.probes
> V1 V2 V3
> V1 0.000000 3.508321 4.352360
> V2 3.508321 0.000000 2.425648
> V3 4.352360 2.425648 0.000000
>
>
> Thank you again.
>
> -Donna
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
-----------------------------------------------------------------------------
Jarno Tuimala, FT, bioinformatiikan asiantuntija, CSC, PL 405, 02101 Espoo
puh.: (09) 457 2226, fax: (09) 457 2302, s-posti: jarno.tuimala at csc.fi
CSC on tieteen tietotekniikan keskus, http://www.csc.fi/molbio
Jarno Tuimala, PhD, bioinformatics, CSC, P.O.Box 405, FI-02101 Espoo, Finland
tel.: +358 9 457 2226, fax: +358 9 457 2302, e-mail: jarno.tuimala at csc.fi
CSC is the Finnish IT Center for Science, http://www.csc.fi/molbio
More information about the Bioconductor
mailing list