# [BioC] Heatmap function

Furge, Kyle Kyle.Furge at vai.org
Wed Oct 29 19:55:43 MET 2003

```This question has come up often at our institute... so here goes for an brief, unformal, explanation.

Clustering depends on some type of distance metric to determine how the samples are related

If you have a set of vectors :
x <- c(1,NA,2,NA)
y <- c(NA,2,NA,1)

The pattern of missing values make computing any type of distance between the vectors incomprehensible. The dist functions return NA for these types of comparisons.

Clustering functions don't like NA because their job is just to organize the data.  A distance of NA is not understandable.

What some programs do (like Eisen's cluster) is "threshold" the NA values to some arbitrary "large" distance.

The following dist function computes distances and then replaces any NA values with an arbitrarily large distance (10% greater then the largest actually distance).  This function may be helpful for input into hclust because NA values are replaced

na.dist <- function(x,...) {
t.dist <- dist(x,...)
t.dist <- as.matrix(t.dist)
t.limit <- 1.1*max(t.dist,na.rm=T)
t.dist[is.na(t.dist)] <- t.limit
t.dist <- as.dist(t.dist)
return(t.dist)
}

I typed this from memory, so it may contain typo's, but you see the idea.

I hope this helps,

-kyle

> -----Original Message-----
> From: Marcus [mailto:marcusb at biotech.kth.se]
> Sent: Wednesday, October 29, 2003 12:02 PM
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Heatmap function
>
>
>
> Hello. I am trying to get a visualization of my data with the
> help of the
> heatmap function that you have written. And I have some
> questions. I would
> really help me alot if someone would answer them.
>
> I have a micorarray experiment consisting of 18
> hybridizations where I have
> taken the M-values of a subset of the genes most likely to be
> differentially expressed and put them into a matrix (The dimension is
> 18x3920). It is 3920 genes. It is on this matrix I have tried
> to apply the
> heatmapfunction. But I get the error message
>
> Error in hclustfun(distfun(x)) : NA/NaN/Inf in foreign
> function call (arg 11)
>
> I guess this is due to the fact that I do not have a value
> for each gene on
> each chip in the subset. Due to experimental errors some
> spots are only
> available on 2 of the 18 chips.
> But I tried this bye making a testmatrix with dimension 5x4.
> The only time
> I got the same error message was if an entire row or an entire column
> contained only NA:s. But otherwise it worked. I have no row
> or column in my
> matrix that entirely consists of NA:s.
> So I wonder if someone now of any limitations that are not in
> the helpfile.
>
> Does anyone know of anywhere where one could read about how
> to use the
> heatmap function of other functions in the mva package.
> Because I do not
> really understand how to change for example the distfun
> argument by just
> reading the helpfile. I guess the distfun = dist is either
> single, average
> or complete linkage for the hclust used in the heatmap but
> how do one write
> to change the distfun argument? What does the function dist mean?
>
> Best regards
>
> / M
>
>
> **************************************************************
> *****************************
> Marcus Gry Björklund
>
> Royal Institute of Technology
> AlbaNova University Center
> Stockholm Center for Physics, Astronomy and Biotechnology
> Department of Molecular Biotechnology 106 91 Stockholm, Sweden
>
> Phone (office): +46 8 553 783 39
> Fax: + 46 8 553 784 81
> Visiting adress: Roslagstullsbacken 21, Floor 3