[BioC] linkage distances
Jarno Tuimala
jtuimala at csc.fi
Thu Jun 14 07:50:29 CEST 2007
Dear Daniel,
Package ape in CRAN contains functions one can use for calculating a
consensus of several dendrograms (consensus). Consensus function can
produce either a strict or majority rule consensus. Strict consensus
contains only the groups that are present in all the trees, whereas
majority rule consensus contains only the trees that are present in the
majority of the trees. I've usually used majority rule consensus, ans its
the standard method used with bootstrapping analyses.
Jarno
On Wed, 13 Jun 2007, Thomas Girke wrote:
> Dear Daniel,
>
> The only reference that I know that addresses this topic to some extend is
> this book:
> The Elements of Statistical Learning
> by T. Hastie, R. Tibshirani, J. H. Friedman
>
>
> With regard to William's suggestion: I don't have anything available that would
> calculate the consensus between different denrograms. As a start to compute these
> comparisons, I would loop over the height component in the hclust objects
> with the cutree function. This way one can obtain all possible clusters
> defined by each dendrogram and then perform all-against-all consensus comparisons
> between different dendrograms using one of the intersect functions (e.g. %in%).
>
> # For example:
> y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
> hr <- hclust(dist(y, method = "euclidean") )
> sapply(hr$height, function(x) cutree(hr, h=x))
>
>
> Thomas
>
>
> On Wed 06/13/07 06:25, William Shannon wrote:
>> I tend to use a 'consensus' approach when doing cluster analysis. If by linkage distance you mean genetic linkage (I assume you do), you could try the various linkage distances and see if the dendrogram is stable. This also works if you are dealing with non-genetic distance measures.
>>
>> If you do this and the dendrograms are essentially stable you are done. More formal methods of consensus trees (dendrograms) can be found doing a search on work by Fred McMorris (look in discrete math and evolutionary biology) and the numerical taxonomy software PAUP I believe has consensus methods in it.
>>
>> Maybe Tom Girke has consensus tools in R/Bioconductor.
>>
>> Bill Shannon
>> Washington Univ. School of Medicine
>>
>> PS -- I am running for President elect of the Classification Society of North America and encourage anyone doing cluster/classification work to look at this society for their research and publications (Journal of Classification and http://www.classification-society.org/csna/csna.html)
>>
>>
>>
>> Daniel Brewer <daniel.brewer at icr.ac.uk> wrote: Hi,
>>
>> I have been producing some dendograms using hclust with a variety of
>> linkage distance measures. Does anyone know or is there a good resource
>> that explains why one would use one linkage distance rather than another?
>>
>> I don't really like dealing with dendograms, but we want to produce
>> groupings based on these to do differential analysis on, and I would
>> like to be able to justify it.
>>
>> Thanks
>>
>> Dan
>>
>> --
>> **************************************************************
>> Daniel Brewer, Ph.D.
>> Institute of Cancer Research
>> Email: daniel.brewer at icr.ac.uk
>> **************************************************************
>>
>> The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>>
>> This e-mail message is confidential and for use by the addre...{{dropped}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> Dr. Thomas Girke
> Assistant Professor of Bioinformatics
> Director, IIGB Bioinformatic Facility
> Center for Plant Cell Biology (CEPCEB)
> Institute for Integrative Genome Biology (IIGB)
> Department of Botany and Plant Sciences
> 1008 Noel T. Keen Hall
> University of California
> Riverside, CA 92521
>
> E-mail: thomas.girke at ucr.edu
> Website: http://faculty.ucr.edu/~tgirke
> Ph: 951-827-2469
> Fax: 951-827-4437
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
-----------------------------------------------------------------------------
Jarno Tuimala, FT, bioinformatiikan asiantuntija, CSC, PL 405, 02101 Espoo
puh.: (09) 457 2226, fax: (09) 457 2302, s-posti: jarno.tuimala at csc.fi
CSC on tieteen tietotekniikan keskus, http://www.csc.fi/molbio
Jarno Tuimala, PhD, bioinformatics, CSC, P.O.Box 405, FI-02101 Espoo, Finland
tel.: +358 9 457 2226, fax: +358 9 457 2302, e-mail: jarno.tuimala at csc.fi
CSC is the Finnish IT Center for Science, http://www.csc.fi/molbio
More information about the Bioconductor
mailing list