[BioC] Question about quantile normalization in normalize.AffyBatch.quantiles

Fri Apr 23 03:32:44 CEST 2010

Hi Bioconductor community,

My understanding of the rma method is that it is composed of the
following three steps:  background correction, quantile normalization,
and probe summarization.  My question concerns the quantile normalized
probe intensities that are returned by the
normalize.AffyBatch.quantiles function, in step 2.  According to
Bolstad et al (Bioinformatics. 2003 19(2):185-93), in which quantile
normalization algorithm is described, the vectors of sorted probe
intensities across all arrays should be equal after quantile
normalization.  However, comparing the sorted probe intensities (in
this example, for the first and second arrays) shows that they are not
equal, and furthermore, plotting the differences reveals an odd
pattern to the differences.  The differences are quite small overall,
and barely affect the higher intensity probes, but I am still curious.
Can anyone explain what is going on here?  (working example provided
below)

Thanks,
Owen

library(CLL)
data("CLLbatch")
step1 <- bg.correct(CLLbatch, "rma")
step2 <- normalize.AffyBatch.quantiles(step1, type="pmonly")
## step3 <- computeExprSet(step2, "pmonly", "medianpolish")
## I have verified that the above 3 steps are essentially equivalent
to rma(CLLbatch)
## no need to run the 3rd step for the following examples

a <- sort(pm(step2)[,1])
b <- sort(pm(step2)[,2])
z <- a-b

## most of the values are not identical...
sum(!z==0)/length(z)
[1] 0.9299108

## ...but the differences fluctuate around zero in an oddly
symmetrical manner...
plot(a, z)

## ...and zooming in shows that the differences come in groups of probes.
plot(z[1:10000])

## also, plotted as a percentage of the intensity, the differences are
never over 3%, and diminish at higher probe intensities
plot(a, z/a)