Fwd: Re: Fwd: RE: [BioC] Heatmap function
Marcus
marcusb at biotech.kth.se
Thu Oct 30 09:31:08 MET 2003
>Hmm, your answer left me thinking about how to measure distances. Why
>doesnt a distace function just calculate the distance between the values
>that are there and leave out the NA:s? I have filtered away with the
>B-test the spots that are supposedly not to be differentially expressed
>and have only a subset of the total number of spots. Three slides of my 18
>have many NA:s. Should I exclude them therefor because the distance is to
>affected?
>
>/ Marcus
>
>
>At 08:21 2003-10-30 +0100, you wrote:
>
>>>Subject: RE: [BioC] Heatmap function
>>>Date: Wed, 29 Oct 2003 13:55:43 -0500
>>>X-MS-Has-Attach:
>>>X-MS-TNEF-Correlator:
>>>Thread-Topic: [BioC] Heatmap function
>>>Thread-Index: AcOeQFWAqheKzfUtT1WkYP+4GKRURgACytsA
>>>From: "Furge, Kyle" <Kyle.Furge at vai.org>
>>>To: "Marcus" <marcusb at biotech.kth.se>
>>>Cc: <bioconductor at stat.math.ethz.ch>
>>>X-MIME-Autoconverted: from quoted-printable to 8bit by
>>>kiev.biotech.kth.se id h9TJK8Pr030736
>>>
>>>This question has come up often at our institute... so here goes for an
>>>brief, unformal, explanation.
>>>
>>>Clustering depends on some type of distance metric to determine how the
>>>samples are related
>>>
>>>If you have a set of vectors :
>>>x <- c(1,NA,2,NA)
>>>y <- c(NA,2,NA,1)
>>>
>>>The pattern of missing values make computing any type of distance
>>>between the vectors incomprehensible. The dist functions return NA for
>>>these types of comparisons.
>>>
>>>Clustering functions don't like NA because their job is just to organize
>>>the data. A distance of NA is not understandable.
>>>
>>>What some programs do (like Eisen's cluster) is "threshold" the NA
>>>values to some arbitrary "large" distance.
>>>
>>>The following dist function computes distances and then replaces any NA
>>>values with an arbitrarily large distance (10% greater then the largest
>>>actually distance). This function may be helpful for input into hclust
>>>because NA values are replaced
>>>
>>>na.dist <- function(x,...) {
>>> t.dist <- dist(x,...)
>>> t.dist <- as.matrix(t.dist)
>>> t.limit <- 1.1*max(t.dist,na.rm=T)
>>> t.dist[is.na(t.dist)] <- t.limit
>>> t.dist <- as.dist(t.dist)
>>> return(t.dist)
>>>}
>>>
>>>I typed this from memory, so it may contain typo's, but you see the idea.
>>>
>>>I hope this helps,
>>>
>>>-kyle
*******************************************************************************************
Marcus Gry Björklund
Royal Institute of Technology
AlbaNova University Center
Stockholm Center for Physics, Astronomy and Biotechnology
Department of Molecular Biotechnology
106 91 Stockholm, Sweden
Phone (office): +46 8 553 783 39
Fax: + 46 8 553 784 81
Visiting adress: Roslagstullsbacken 21, Floor 3
Delivery adress: Roslagsvägen 30B
More information about the Bioconductor
mailing list