[BioC] Pearson Correlation & log transformed data.

Tue Apr 28 13:59:06 CEST 2009

Hi Folks,

Hopefully this will be easy for somebody to answer, but I'm interested
in clustering the expression profiles of genes from 8 timepoints using
Pearson Correlation. I'm using code like this:

dist_samples_pea <- as.dist(1-cor(t(filtMat), method ="pearson"))
hc_samples_pea <- hclust(dist_samples_pea, method="average")
plot(hc_samples_pea, hang=-1, ann=T, cex=0.75, main="Pearson")

where filtMat is a martix of my data (basically exprs(eset) with some
genesets removed). The code is from this document:

http://www.google.com/url?sa=U&start=1&q=http://www.giu.fi/portals/0/science/Courses/Microarrays/Practical%2520Bioinformatics%25202007/Exercises/Class%2520discovery%2520using%2520R%2520Bioconductor,%252012-4-2007_2.doc&ei=ee32Sdq-IILz-Ab4oOTBDw&usg=AFQjCNEqcWn-oQs5ggMAdZ_QZofs5P3W0g

My question is about whether it makes a difference that I'm using the
log transformed data? I know that the log transform is not linear,
meaning that logged data and raw data will yield different clusters.
I'd very much appreciate if somebody could justify one or the other
course of action.

Thanks a bunch,

Paul.

-- 
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland