[BioC] linkage distances
Thomas Girke
thomas.girke at ucr.edu
Wed Jun 13 18:41:46 CEST 2007
Dear Daniel,
The only reference that I know that addresses this topic to some extend is
this book:
The Elements of Statistical Learning
by T. Hastie, R. Tibshirani, J. H. Friedman
With regard to William's suggestion: I don't have anything available that would
calculate the consensus between different denrograms. As a start to compute these
comparisons, I would loop over the height component in the hclust objects
with the cutree function. This way one can obtain all possible clusters
defined by each dendrogram and then perform all-against-all consensus comparisons
between different dendrograms using one of the intersect functions (e.g. %in%).
# For example:
y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
hr <- hclust(dist(y, method = "euclidean") )
sapply(hr$height, function(x) cutree(hr, h=x))
Thomas
On Wed 06/13/07 06:25, William Shannon wrote:
> I tend to use a 'consensus' approach when doing cluster analysis. If by linkage distance you mean genetic linkage (I assume you do), you could try the various linkage distances and see if the dendrogram is stable. This also works if you are dealing with non-genetic distance measures.
>
> If you do this and the dendrograms are essentially stable you are done. More formal methods of consensus trees (dendrograms) can be found doing a search on work by Fred McMorris (look in discrete math and evolutionary biology) and the numerical taxonomy software PAUP I believe has consensus methods in it.
>
> Maybe Tom Girke has consensus tools in R/Bioconductor.
>
> Bill Shannon
> Washington Univ. School of Medicine
>
> PS -- I am running for President elect of the Classification Society of North America and encourage anyone doing cluster/classification work to look at this society for their research and publications (Journal of Classification and http://www.classification-society.org/csna/csna.html)
>
>
>
> Daniel Brewer <daniel.brewer at icr.ac.uk> wrote: Hi,
>
> I have been producing some dendograms using hclust with a variety of
> linkage distance measures. Does anyone know or is there a good resource
> that explains why one would use one linkage distance rather than another?
>
> I don't really like dealing with dendograms, but we want to produce
> groupings based on these to do differential analysis on, and I would
> like to be able to justify it.
>
> Thanks
>
> Dan
>
> --
> **************************************************************
> Daniel Brewer, Ph.D.
> Institute of Cancer Research
> Email: daniel.brewer at icr.ac.uk
> **************************************************************
>
> The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the addre...{{dropped}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Dr. Thomas Girke
Assistant Professor of Bioinformatics
Director, IIGB Bioinformatic Facility
Center for Plant Cell Biology (CEPCEB)
Institute for Integrative Genome Biology (IIGB)
Department of Botany and Plant Sciences
1008 Noel T. Keen Hall
University of California
Riverside, CA 92521
E-mail: thomas.girke at ucr.edu
Website: http://faculty.ucr.edu/~tgirke
Ph: 951-827-2469
Fax: 951-827-4437
More information about the Bioconductor
mailing list