Fwd: Re: Fwd: RE: [BioC] Heatmap function

Marcus marcusb at biotech.kth.se
Thu Oct 30 09:31:08 MET 2003


>Hmm, your answer left me thinking about how to measure distances. Why 
>doesnt a distace function just calculate the distance between the values 
>that are there and leave out the NA:s? I have filtered away with the 
>B-test the spots that are supposedly not to be differentially expressed 
>and have only a subset of the total number of spots. Three slides of my 18 
>have many NA:s. Should I exclude them therefor because the distance is to 
>affected?
>
>/ Marcus
>
>
>At 08:21 2003-10-30 +0100, you wrote:
>
>>>Subject: RE: [BioC] Heatmap function
>>>Date: Wed, 29 Oct 2003 13:55:43 -0500
>>>X-MS-Has-Attach:
>>>X-MS-TNEF-Correlator:
>>>Thread-Topic: [BioC] Heatmap function
>>>Thread-Index: AcOeQFWAqheKzfUtT1WkYP+4GKRURgACytsA
>>>From: "Furge, Kyle" <Kyle.Furge at vai.org>
>>>To: "Marcus" <marcusb at biotech.kth.se>
>>>Cc: <bioconductor at stat.math.ethz.ch>
>>>X-MIME-Autoconverted: from quoted-printable to 8bit by 
>>>kiev.biotech.kth.se id h9TJK8Pr030736
>>>
>>>This question has come up often at our institute... so here goes for an 
>>>brief, unformal, explanation.
>>>
>>>Clustering depends on some type of distance metric to determine how the 
>>>samples are related
>>>
>>>If you have a set of vectors :
>>>x <- c(1,NA,2,NA)
>>>y <- c(NA,2,NA,1)
>>>
>>>The pattern of missing values make computing any type of distance 
>>>between the vectors incomprehensible. The dist functions return NA for 
>>>these types of comparisons.
>>>
>>>Clustering functions don't like NA because their job is just to organize 
>>>the data.  A distance of NA is not understandable.
>>>
>>>What some programs do (like Eisen's cluster) is "threshold" the NA 
>>>values to some arbitrary "large" distance.
>>>
>>>The following dist function computes distances and then replaces any NA 
>>>values with an arbitrarily large distance (10% greater then the largest 
>>>actually distance).  This function may be helpful for input into hclust 
>>>because NA values are replaced
>>>
>>>na.dist <- function(x,...) {
>>>  t.dist <- dist(x,...)
>>>  t.dist <- as.matrix(t.dist)
>>>  t.limit <- 1.1*max(t.dist,na.rm=T)
>>>  t.dist[is.na(t.dist)] <- t.limit
>>>  t.dist <- as.dist(t.dist)
>>>  return(t.dist)
>>>}
>>>
>>>I typed this from memory, so it may contain typo's, but you see the idea.
>>>
>>>I hope this helps,
>>>
>>>-kyle

*******************************************************************************************
Marcus Gry Björklund

Royal Institute of Technology
AlbaNova University Center
Stockholm Center for Physics, Astronomy and Biotechnology
Department of Molecular Biotechnology
106 91 Stockholm, Sweden

Phone (office): +46 8 553 783 39
Fax: + 46 8 553 784 81
Visiting adress: Roslagstullsbacken 21, Floor 3
Delivery adress: Roslagsvägen 30B



More information about the Bioconductor mailing list