[BioC] correlation between M values of replicate arrays

Wed Feb 7 08:37:31 CET 2007

 Thanks for both explanations on this subject. You guessed that my experiment was with a reference design! Ok, now I am a bit more relief and aware of what should be expected.

Best regards

João Fadista
Ph.d. student

UNIVERSITY OF AARHUS
Faculty of Agricultural Sciences
Research Centre Foulum
Dept. of Genetics and Biotechnology
Blichers Allé 20, P.O. BOX 50
DK-8830 Tjele

Phone:   +45 8999 1900
Direct:  +45 8999 1900

E-mail:  Joao.Fadista at agrsci.dk
Web:	   http://www.agrsci.org				

This email may contain information that is confidential.
Any use or publication of this email without written permission from Faculty of Agricultural Sciences is not allowed.
If you are not the intended recipient, please notify Faculty of Agricultural Sciences immediately and delete this email.

-----Original Message-----
From: Wolfgang Huber [mailto:huber at ebi.ac.uk] 
Sent: Wednesday, February 07, 2007 12:27 AM
To: Claus Mayer
Cc: João Fadista; bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] correlation between M values of replicate arrays

Dear João,

just to add to Claus' excellent explanation: high correlation between replicates is NOT the same as good data quality. Otherwise, a normalisation method that would replace each probe's M value by a fixed number that only depends on the gene name would be the best(*). What you want is an improvement in MSE = bias² + variance. See e.g. also http://en.wikipedia.org/wiki/Mean_squared_error

(*) Btw, this is no joke, on oligonucleotide arrays the dependence of the background (unspecific) signal on GC content and other sequence features can have exactly this effect.

  Best wishes
  Wolfgang

Claus Mayer wrote:
> Dear João!
> 
> Most normalisation methods assume that the majority of genes are not 
> differentially expressed, i.e. that there expected M value is 0. If 
> this assumption is correct properly normalized data will show only 
> weak correlation between M values from different arrays, so observing 
> this in your normalised data is not necessarily a reason to worry.
> 
> There are different reasons why the unnormalized arrays might show 
> higher correlations. The most obvious situation that comes to my mind 
> is if you use something like a reference design, i.e you always have a 
> control on dye1 and the treamtment sample on dye2. The intensity 
> depending dye bias (which you try to remove with loess normalisation) 
> will then automatically lead to correlated M values.
> It is an unwanted correlation though, caused by a systematic bias, so 
> the normalized data with less correlation are "better" in this case.
> There will be other scenarios where something like that happens, but 
> without knowing details about your experiment it makes little sense to 
> speculate about them.
> 
> Hope that helps
> 
> Claus
> 
> João Fadista wrote:
>> Dear all,
>>
>> I have some questions that I would like to pose to this list.
>>
>> When I normalize microarray data (usually with the methods in 
>> normalizeWithinArrays function in limma package) I decrease the 
>> correlation between the M values of my replicate arrays. This 
>> obviously has an explanation bacause if we normalize "within" arrays, 
>> the differences between them tend to become larger. Therefore, if the 
>> correlation between replicates decrease, it seems like if we 
>> normalize our data we would get "worse" data.
>>
>> Is this true? Does it happen the same to you? And how do you deal 
>> with that?
>>
>>
>>
>> Med venlig hilsen / Regards
>>
>> João Fadista Ph.d. studerende / Ph.d. student
>>