[BioC] RMA/QuantileNormalization results difference between oligo and aroma.affymetrix for Hugene

Henrik Bengtsson hb at stat.berkeley.edu
Fri Feb 26 13:47:35 CET 2010


Hi,

you concerns about reproducibility are very important.  Luckily your
observations are based on mistakes as explained below.

On Fri, Feb 26, 2010 at 12:05 PM, Benilton Carvalho
<beniltoncarvalho at gmail.com> wrote:
> Quantile normalization is already one step in the RMA workflow.
> Therefore, there's no need to normalize the data again once you've
> gone RMA, ie. (regarding oligo) your call
> "normalize.quantiles(exprs(rmadata))" should be dropped.
>
> Using the defaults, rma() in oligo will:
>
> 1) Background correct (via the RMA convolution model)
> 2) Quantile normalize
> 3) Summarize via median-polish.

Yes, as Benilton points out it seems like you've misunderstood what
RMA does.  The author of RMA (Ben Bolstad) defines the term RMA to
mean the complete preprocessing suite including summarization.

>
> b
>
> On Fri, Feb 26, 2010 at 10:46 AM, Mikhail Pachkov <pachkov at gmail.com> wrote:
>> Dear All,
>>
>> I am new in microarray analysis and need your expertise.
>> I need to develop procedure for producing expression values from CEL
>> files. Data should processed with RMA and quantile normalized. I have
>> tried two packages - oligo and aroma.affymetrix. Obtained results are
>> quite different form my point of view. Moreover
>> aroma.affymetrix::QuantileNormalization function produce dta which do
>> not look like they were quantile normalized.

What is 'dta'?

>>  I have made density plots of data after RMA and after quantile
>> normalization which are available here
>> http://test.swissregulon.unibas.ch/bioc/index.html There are also
>> links to two CEL files I have used.
>>
>> I have a few questions:

Below, I will take that you mean "RMA background correct" when you say "RMA".

>> Why RMA results are so different?

The RMA-style background correction in aroma.affymetrix utilizes
affy::bg.adjust() [and normalizes PM probes only].  I'm not sure what
algorithm/implementation oligo is using for this step, but they should
give very similar corrected probe signals.

>> Which RMA implementation is correct?

So, aroma.affymetrix is basically just a wrapper for
affy::bg.adjust(), which I think was the original implementation of
RMA background correction.  I let Benilton comment on the oligo
implementation and it's origin.

>> Why does quantile normalization in aroma.affymetrics produce two
>> different distributions?

Because you first run quantile normalization on PMs only, then you
look at the density plot for all (PMs & MMs).  More below.

>>
>> Thank you in advance!
>>
>> Here are R scripts I have used:
>>
>> ################################
>> #aroma.affymetrix
>> library(aroma.affymetrix);
>> verbose <- Arguments$getVerbose(-8, timestamp=TRUE);
>>
>> # read files
>> cdf <- AffymetrixCdfFile('annotationData/chipTypes/HuGene-1_0-st-v1/HuGene-1_0-st-v1.cdf');
>> cs <- AffymetrixCelSet$byPath("rawData/mine/HuGene-1_0-st-v1/");

Have to bring it up: Please, do not setup your annotation data and and
data sets like this. An aroma.affymetrix script should not contain
paths/pathnames, cf. "Dos and Don'ts":

  http://aroma-project.org/node/102

The correct way to do the above is:

cs <- AffymetrixCelSet$byName("mine", chipType="HuGene-1_0-st-v1");

alternatively, if you wish to be explicit in what CDF is used, you can do:

cdf <- AffymetrixCdfFile$byChipType("HuGene-1_0-st-v1");
cs <- AffymetrixCelSet$byName("mine", cdf=cdf);

>>
>> # RMA
>> bc <- RmaBackgroundCorrection(cs);
>> csBC <- process(bc,verbose=verbose);
>>
>> # QuantileNormalization
>> qn <- QuantileNormalization(csBC, typesToUpdate="pm");
>> csN <- process(qn);

Note, the argument 'typesToUpdate' says that it is only PM probes that
will be updated. The default is that MMs are left "as is".

>>
>> # Plots
>> image_file <- ("aroma.affymetrix.RMA.png");
>> png(image_file,width=1028,height=768);
>> plotDensity(csBC);

Here you are plotting all probes on the array.  Since
RmaBackgroundCorrection is only correcting PM probes, you probably
want to do:

plotDensity(csBC, types="pm");

>> title("aroma.affymetrix RMA data");
>> dev.off();
>>
>> image_file <- ("aroma.affymetrix.QN.png");
>> png(image_file,width=1028,height=768);
>> plotDensity(csN);

plotDensity(csN, types="pm");

This is the key to why you get different density plots.  For a
thorough explaination of the various QN approaches, see Page
'Empirical probe-signal densities and rank-based quantile
normalization':

  http://aroma-project.org/node/141

>> title("aroma.affymetrix QN data");
>> dev.off()

What you haven't compared yet, because you misunderstood the RMA
pipeline, are the summarized probe signals from fitting the
log-additive RMA model.

FYI, it is part of our (24 hours) redundancy testing to assert that
the aroma.affymetrix RMA pipeline can reproduce the RMA pipeline and
RMA summary estimates of the affyPLM package.  You can see how well
this is done on Page 'Replication: RMA (background, normalization &
summarization)':

  http://www.aroma-project.org/replication/RMA

Hope this helps.

Henrik

>> ################################
>>
>> ################################
>> # oligo
>> library(oligo);
>> rawdata=read.celfiles(c("rawData/mine/HuGene-1_0-st-v1/sample1.CEL","rawData/mine/HuGene-1_0-st-v1/sample2.CEL"));
>> rmadata=rma(rawdata);
>> qndata=normalize.quantiles(exprs(rmadata))
>>
>> library(affy)
>> # Plots
>> image_file <- ("oligo.RMA.png");
>> png(image_file,width=1028,height=768);
>> plotDensity(exprs(rmadata));
>> title("oligo RMA data");
>> dev.off();
>>
>> image_file <- ("oligo.QN.png");
>> png(image_file,width=1028,height=768);
>> plotDensity(qndata);
>> title("oligo QN data");
>> dev.off()
>> ###############################
>>
>> Kind regards,
>>
>> Mikhail Pachkov
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list