[BioC] Treatment of Duplicate spots

Pita pwilkinson_m at xbioinformatics.org
Wed Feb 9 17:12:08 CET 2005


At 08:05 PM 2/7/2005, Matthew Ritchie wrote:



>The information from the duplicate spots can be summarised using lmFit() 
>with the appropriate arguments.  The approach taken in limma is to assume 
>that the duplicate spots are correlated by being on the same array, a 
>fixed distance apart (the function duplicateCorrelation() is used to 
>estimate this correlation).  An alternative approach would be to average 
>the duplicate log-ratios prior to fitting the linear model.
>
>>For the case of duplicate spotting, what is the significance of merging 
>>the raw channels seperately prior to creating MA values with the loess 
>>normalization, then between chip scaling.
>
>I'm not sure what you mean here.  There are usually two channels per array 
>for two-colour microarrays.  Do you mean create 4 channels per array, one 
>for each duplicate set in each channel?  I'm not sure that this would be 
>helpful.


Actually, my bad.  I meant merging the duplicate spots WITHIN each raw 
channel seperately  PRIOR to calculating the log-ratios (M-values). The 
duplicate spots on our arrays correlate very very well, to the point where 
I think that spotting probes twice seems wasteful (it would be better if 
the duplicate spots were randomly distributed or duplicate spotting to be 
meaningful IMHO, but the spotting technology is not capable of doing this f).

I like the idea of using quantile scaling between chips, assuming n-spots 
for m-genes that will be fine. however when there are duplicate spots for 
each probe, each probe is adjusted independently, and when I compared the M 
values with the raw R and G channel duplicates, the correlation between the 
duplicate M-values was quite poor. I am expecting this is because the 
quantile normalization assumes that each duplicate-spot is handled separately.

So my question is, do I gain or loose by merging the raw duplicate values 
within the R and G separately prior to calculating the M values. I am no 
expert in statistics to say whether or not this is acceptable.




>>How many spots in a chip would be required to run quantile normalization 
>>vs scale normalization when using normalizeBetweenArrays?
>
>The lower limit for quantile normalization is 2 spots, and for scale 
>normalization it's 1 spot.  Normalization is probably not such a big deal 
>with so few spots though ;)

yes if I only had 2 good spots I would generally be unhappy with 
microarray. But it seems that I need to use scale normalisation for small 
chips, like 300 spots, and quantile for large arrays like 19k, because with 
such a large scale of points, scale normalization may force more genes into 
the tails of the distribution of M-values, if you were looking at the 
box-plots.

Thanks for the help

Peter



More information about the Bioconductor mailing list