[BioC] Pearson Correlation & log transformed data.
    Paul Geeleher 
    paulgeeleher at gmail.com
       
    Tue Apr 28 13:59:06 CEST 2009
    
    
  
Hi Folks,
Hopefully this will be easy for somebody to answer, but I'm interested
in clustering the expression profiles of genes from 8 timepoints using
Pearson Correlation. I'm using code like this:
dist_samples_pea <- as.dist(1-cor(t(filtMat), method ="pearson"))
hc_samples_pea <- hclust(dist_samples_pea, method="average")
plot(hc_samples_pea, hang=-1, ann=T, cex=0.75, main="Pearson")
where filtMat is a martix of my data (basically exprs(eset) with some
genesets removed). The code is from this document:
http://www.google.com/url?sa=U&start=1&q=http://www.giu.fi/portals/0/science/Courses/Microarrays/Practical%2520Bioinformatics%25202007/Exercises/Class%2520discovery%2520using%2520R%2520Bioconductor,%252012-4-2007_2.doc&ei=ee32Sdq-IILz-Ab4oOTBDw&usg=AFQjCNEqcWn-oQs5ggMAdZ_QZofs5P3W0g
My question is about whether it makes a difference that I'm using the
log transformed data? I know that the log transform is not linear,
meaning that logged data and raw data will yield different clusters.
I'd very much appreciate if somebody could justify one or the other
course of action.
Thanks a bunch,
Paul.
-- 
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
    
    
More information about the Bioconductor
mailing list