[BioC] RMA/QuantileNormalization results difference between oligo and aroma.affymetrix for Hugene
Benilton Carvalho
beniltoncarvalho at gmail.com
Fri Feb 26 14:29:01 CET 2010
regarding oligo's implementation of RMA, it's the same as the one in affy.
b
On Fri, Feb 26, 2010 at 12:47 PM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> Hi,
>
> you concerns about reproducibility are very important. Luckily your
> observations are based on mistakes as explained below.
>
> On Fri, Feb 26, 2010 at 12:05 PM, Benilton Carvalho
> <beniltoncarvalho at gmail.com> wrote:
>> Quantile normalization is already one step in the RMA workflow.
>> Therefore, there's no need to normalize the data again once you've
>> gone RMA, ie. (regarding oligo) your call
>> "normalize.quantiles(exprs(rmadata))" should be dropped.
>>
>> Using the defaults, rma() in oligo will:
>>
>> 1) Background correct (via the RMA convolution model)
>> 2) Quantile normalize
>> 3) Summarize via median-polish.
>
> Yes, as Benilton points out it seems like you've misunderstood what
> RMA does. The author of RMA (Ben Bolstad) defines the term RMA to
> mean the complete preprocessing suite including summarization.
>
>>
>> b
>>
>> On Fri, Feb 26, 2010 at 10:46 AM, Mikhail Pachkov <pachkov at gmail.com> wrote:
>>> Dear All,
>>>
>>> I am new in microarray analysis and need your expertise.
>>> I need to develop procedure for producing expression values from CEL
>>> files. Data should processed with RMA and quantile normalized. I have
>>> tried two packages - oligo and aroma.affymetrix. Obtained results are
>>> quite different form my point of view. Moreover
>>> aroma.affymetrix::QuantileNormalization function produce dta which do
>>> not look like they were quantile normalized.
>
> What is 'dta'?
>
>>> I have made density plots of data after RMA and after quantile
>>> normalization which are available here
>>> http://test.swissregulon.unibas.ch/bioc/index.html There are also
>>> links to two CEL files I have used.
>>>
>>> I have a few questions:
>
> Below, I will take that you mean "RMA background correct" when you say "RMA".
>
>>> Why RMA results are so different?
>
> The RMA-style background correction in aroma.affymetrix utilizes
> affy::bg.adjust() [and normalizes PM probes only]. I'm not sure what
> algorithm/implementation oligo is using for this step, but they should
> give very similar corrected probe signals.
>
>>> Which RMA implementation is correct?
>
> So, aroma.affymetrix is basically just a wrapper for
> affy::bg.adjust(), which I think was the original implementation of
> RMA background correction. I let Benilton comment on the oligo
> implementation and it's origin.
>
>>> Why does quantile normalization in aroma.affymetrics produce two
>>> different distributions?
>
> Because you first run quantile normalization on PMs only, then you
> look at the density plot for all (PMs & MMs). More below.
>
>>>
>>> Thank you in advance!
>>>
>>> Here are R scripts I have used:
>>>
>>> ################################
>>> #aroma.affymetrix
>>> library(aroma.affymetrix);
>>> verbose <- Arguments$getVerbose(-8, timestamp=TRUE);
>>>
>>> # read files
>>> cdf <- AffymetrixCdfFile('annotationData/chipTypes/HuGene-1_0-st-v1/HuGene-1_0-st-v1.cdf');
>>> cs <- AffymetrixCelSet$byPath("rawData/mine/HuGene-1_0-st-v1/");
>
> Have to bring it up: Please, do not setup your annotation data and and
> data sets like this. An aroma.affymetrix script should not contain
> paths/pathnames, cf. "Dos and Don'ts":
>
> http://aroma-project.org/node/102
>
> The correct way to do the above is:
>
> cs <- AffymetrixCelSet$byName("mine", chipType="HuGene-1_0-st-v1");
>
> alternatively, if you wish to be explicit in what CDF is used, you can do:
>
> cdf <- AffymetrixCdfFile$byChipType("HuGene-1_0-st-v1");
> cs <- AffymetrixCelSet$byName("mine", cdf=cdf);
>
>>>
>>> # RMA
>>> bc <- RmaBackgroundCorrection(cs);
>>> csBC <- process(bc,verbose=verbose);
>>>
>>> # QuantileNormalization
>>> qn <- QuantileNormalization(csBC, typesToUpdate="pm");
>>> csN <- process(qn);
>
> Note, the argument 'typesToUpdate' says that it is only PM probes that
> will be updated. The default is that MMs are left "as is".
>
>>>
>>> # Plots
>>> image_file <- ("aroma.affymetrix.RMA.png");
>>> png(image_file,width=1028,height=768);
>>> plotDensity(csBC);
>
> Here you are plotting all probes on the array. Since
> RmaBackgroundCorrection is only correcting PM probes, you probably
> want to do:
>
> plotDensity(csBC, types="pm");
>
>>> title("aroma.affymetrix RMA data");
>>> dev.off();
>>>
>>> image_file <- ("aroma.affymetrix.QN.png");
>>> png(image_file,width=1028,height=768);
>>> plotDensity(csN);
>
> plotDensity(csN, types="pm");
>
> This is the key to why you get different density plots. For a
> thorough explaination of the various QN approaches, see Page
> 'Empirical probe-signal densities and rank-based quantile
> normalization':
>
> http://aroma-project.org/node/141
>
>>> title("aroma.affymetrix QN data");
>>> dev.off()
>
> What you haven't compared yet, because you misunderstood the RMA
> pipeline, are the summarized probe signals from fitting the
> log-additive RMA model.
>
> FYI, it is part of our (24 hours) redundancy testing to assert that
> the aroma.affymetrix RMA pipeline can reproduce the RMA pipeline and
> RMA summary estimates of the affyPLM package. You can see how well
> this is done on Page 'Replication: RMA (background, normalization &
> summarization)':
>
> http://www.aroma-project.org/replication/RMA
>
> Hope this helps.
>
> Henrik
>
>>> ################################
>>>
>>> ################################
>>> # oligo
>>> library(oligo);
>>> rawdata=read.celfiles(c("rawData/mine/HuGene-1_0-st-v1/sample1.CEL","rawData/mine/HuGene-1_0-st-v1/sample2.CEL"));
>>> rmadata=rma(rawdata);
>>> qndata=normalize.quantiles(exprs(rmadata))
>>>
>>> library(affy)
>>> # Plots
>>> image_file <- ("oligo.RMA.png");
>>> png(image_file,width=1028,height=768);
>>> plotDensity(exprs(rmadata));
>>> title("oligo RMA data");
>>> dev.off();
>>>
>>> image_file <- ("oligo.QN.png");
>>> png(image_file,width=1028,height=768);
>>> plotDensity(qndata);
>>> title("oligo QN data");
>>> dev.off()
>>> ###############################
>>>
>>> Kind regards,
>>>
>>> Mikhail Pachkov
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
More information about the Bioconductor
mailing list