[BioC] Question about quantile normalization in normalize.AffyBatch.quantiles
Ben Bolstad
bmb at bmbolstad.com
Fri Apr 23 16:02:00 CEST 2010
Almost certainly what you are seeing is a reflection of the fact that
the quantile normalization code underlying normalize.AffyBatch.quantiles
is designed to handle ties. Basically, the algorithm attempts to ensure
that data values that are exactly equal on input within a specific array
(column) are also exactly equal on output. The algorithmic description
in the paper skates over the issue of how to appropriately deal with
ties.
For the data below there are plenty of ties
> length(unique(a))
[1] 7270
> length(unique(sort(pm(step1)[,1])))
[1] 7270
> length(a)
[1] 201800
On Thu, 2010-04-22 at 18:32 -0700, Owen Solberg wrote:
> Hi Bioconductor community,
>
> My understanding of the rma method is that it is composed of the
> following three steps: background correction, quantile normalization,
> and probe summarization. My question concerns the quantile normalized
> probe intensities that are returned by the
> normalize.AffyBatch.quantiles function, in step 2. According to
> Bolstad et al (Bioinformatics. 2003 19(2):185-93), in which quantile
> normalization algorithm is described, the vectors of sorted probe
> intensities across all arrays should be equal after quantile
> normalization. However, comparing the sorted probe intensities (in
> this example, for the first and second arrays) shows that they are not
> equal, and furthermore, plotting the differences reveals an odd
> pattern to the differences. The differences are quite small overall,
> and barely affect the higher intensity probes, but I am still curious.
> Can anyone explain what is going on here? (working example provided
> below)
>
> Thanks,
> Owen
>
>
> library(CLL)
> data("CLLbatch")
> step1 <- bg.correct(CLLbatch, "rma")
> step2 <- normalize.AffyBatch.quantiles(step1, type="pmonly")
> ## step3 <- computeExprSet(step2, "pmonly", "medianpolish")
> ## I have verified that the above 3 steps are essentially equivalent
> to rma(CLLbatch)
> ## no need to run the 3rd step for the following examples
>
> a <- sort(pm(step2)[,1])
> b <- sort(pm(step2)[,2])
> z <- a-b
>
> ## most of the values are not identical...
> sum(!z==0)/length(z)
> [1] 0.9299108
>
> ## ...but the differences fluctuate around zero in an oddly
> symmetrical manner...
> plot(a, z)
>
> ## ...and zooming in shows that the differences come in groups of probes.
> plot(z[1:10000])
>
> ## also, plotted as a percentage of the intensity, the differences are
> never over 3%, and diminish at higher probe intensities
> plot(a, z/a)
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list