[BioC] correlation between M values of replicate arrays

Wed Feb 7 00:27:20 CET 2007

Dear João,

just to add to Claus' excellent explanation: high correlation between 
replicates is NOT the same as good data quality. Otherwise, a 
normalisation method that would replace each probe's M value by a fixed 
number that only depends on the gene name would be the best(*). What you 
want is an improvement in MSE = bias² + variance. See e.g. also
http://en.wikipedia.org/wiki/Mean_squared_error

(*) Btw, this is no joke, on oligonucleotide arrays the dependence of 
the background (unspecific) signal on GC content and other sequence 
features can have exactly this effect.

  Best wishes
  Wolfgang

Claus Mayer wrote:
> Dear João!
> 
> Most normalisation methods assume that the majority of genes are not 
> differentially expressed, i.e. that there expected M value is 0. If this 
> assumption is correct properly normalized data will show only weak 
> correlation between M values from different arrays, so observing this in 
> your normalised data is not necessarily a reason to worry.
> 
> There are different reasons why the unnormalized arrays might show 
> higher correlations. The most obvious situation that comes to my mind is 
> if you use something like a reference design, i.e you always have a 
> control on dye1 and the treamtment sample on dye2. The intensity 
> depending dye bias (which you try to remove with loess normalisation) 
> will then automatically lead to correlated M values.
> It is an unwanted correlation though, caused by a systematic bias, so 
> the normalized data with less correlation are "better" in this case. 
> There will be other scenarios where something like that happens, but 
> without knowing details about your experiment it makes little sense to 
> speculate about them.
> 
> Hope that helps
> 
> Claus
> 
> João Fadista wrote:
>> Dear all,
>>
>> I have some questions that I would like to pose to this list.
>>
>> When I normalize microarray data (usually with the methods in
>> normalizeWithinArrays function in limma package) I decrease the
>> correlation between the M values of my replicate arrays. This
>> obviously has an explanation bacause if we normalize "within" arrays,
>> the differences between them tend to become larger. Therefore, if the
>> correlation between replicates decrease, it seems like if we
>> normalize our data we would get "worse" data.
>>
>> Is this true? Does it happen the same to you? And how do you deal
>> with that?
>>
>>
>>
>> Med venlig hilsen / Regards
>>
>> João Fadista Ph.d. studerende / Ph.d. student
>>