[BioC] clustering question

Mon Feb 20 05:23:05 CET 2006

I have a general question about clustering of genomic data. The heatmaps
that are generated are usually scaled row-wise so that variations are
apparent within rows but not between rows. In looking at the
documentation of heatmap and hclust, however, is appears that this
scaling is done after the actual clustering is performed. If heatmap is
performed on the hclust object with scale="none", it is apparent that
most of the row clustering is based on overall gene expression levels,
not on similar column-wise behavior between rows.

Wouldn't it make sense to scale row-wise before clustering so that the
row clusters are based more on the correlation of the behavior of rows
between columns, i.e. two genes would be near each other if the genes
behaved similarly across samples? I realize that some of this effect may
be achieved with unscaled data, but it seems to me that the large
overall expression differences may minimize that.

I hope this makes sense, I have perhaps not used all of the correct
nomenclature.

Thanks,

Mark

Mark W. Kimpel MD 

Department of Psychiatry

Indiana University School of Medicine

Biotechnology, Research, & Training Center

1345 W. 16th Street

Indianapolis, IN  46202