```>Hmm, your answer left me thinking about how to measure distances. Why
>doesnt a distace function just calculate the distance between the values
>that are there and leave out the NA:s? I have filtered away with the
>B-test the spots that are supposedly not to be differentially expressed
>and have only a subset of the total number of spots. Three slides of my 18
>have many NA:s. Should I exclude them therefor because the distance is to
>affected?
>
>/ Marcus
>
>
>>>
>>>This question has come up often at our institute... so here goes for an
>>>brief, unformal, explanation.
>>>
>>>Clustering depends on some type of distance metric to determine how the
>>>samples are related
>>>
>>>If you have a set of vectors :
>>>x <- c(1,NA,2,NA)
>>>y <- c(NA,2,NA,1)
>>>
>>>The pattern of missing values make computing any type of distance
>>>between the vectors incomprehensible. The dist functions return NA for
>>>these types of comparisons.
>>>
>>>Clustering functions don't like NA because their job is just to organize
>>>the data.  A distance of NA is not understandable.
>>>
>>>What some programs do (like Eisen's cluster) is "threshold" the NA
>>>values to some arbitrary "large" distance.
>>>
>>>The following dist function computes distances and then replaces any NA
>>>values with an arbitrarily large distance (10% greater then the largest
>>>actually distance).  This function may be helpful for input into hclust
>>>because NA values are replaced
>>>
>>>na.dist <- function(x,...) {
>>>  t.dist <- dist(x,...)
>>>  t.dist <- as.matrix(t.dist)
>>>  t.limit <- 1.1*max(t.dist,na.rm=T)
>>>  t.dist[is.na(t.dist)] <- t.limit
>>>  t.dist <- as.dist(t.dist)
>>>  return(t.dist)
>>>}
>>>
>>>I typed this from memory, so it may contain typo's, but you see the idea.
>>>
>>>I hope this helps,
>>>
>>>-kyle

