[BioC] Seperate quantile normalization but common probe summary by median polish (oligo package)?

Thu Aug 2 16:03:37 CEST 2012

Hi Johanna,

On 8/2/2012 5:30 AM, Johanna Schott [guest] wrote:
> Dear list,
>
> I am pre-processing Affymetrix Mouse Gene 1.0 ST Arrays and use the oligo package. I do not want to quantile normalize them all together, because my samples come from different polysome fractions or compartments of the cell, and therefore show consistent and biologically meaningful differences in signal distribution.
> For seperate probe summary by median polish, however, the groups are too small:
> The smallest group has only 3 microarrays, which leads to identical values within many probe sets across the three samples.
>
> My idea is to perform quantile normalization for the individual groups, but probe summary for all microarrays (30) together, to have a more reliable estimate of the probe effect and to avoid that I lose the variability of my samples when a group consists of only 3 microarrays.
>
> Is this reasonable, or is anyone aware of artifacts that I would introduce by performing median polish for probe summary on microarrays that have not been quantile normalized together?

I don't think that is a good idea. When you fit a model using median 
polish, the underlying assumptions are of similar distributions and 
variances of the data, which will clearly not be the case if you 
normalize separately.

I assume you are planning to compare the different polysome fractions. 
In addition, I assume that extracting the polysomes is a relatively 
laborious process that is likely to introduce technical variability.

Given the above assumptions, this will likely be a difficult data set to 
analyze, and I would think you will have to make some pretty strong 
assumptions. I (and most others who answer questions on this list) am 
hesitant to offer any statistical advice - without data in hand it is 
very difficult to say what you should do. In addition, most of us make a 
living by analyzing data, so doing our work for free isn't a viable 
strategy.

That said, I would tend to start off simple, processing all samples 
together and seeing if I have evidence that doing so was a bad idea, 
rather than the converse.

Best,

Jim

>
> Here is some code to illustrate what I am doing:
>
> # I load the required packages:
> library("oligo")
> library("pd.mogene.1.0.st.v1")
>
> # the CEL files are opened twice, once in groups (here only group 1 as an example), and once all together:
> list_cel<- list.celfiles("group1")
> group1<- read.celfiles(list_cel)
>
> list_cel<- list.celfiles("all_groups")
> all_groups<- read.celfiles(list_cel)
>
> # I perform background correction and quantile normalization for the pm values of the individual groups (here only group1):
> pms_group1<- pm(group1)
> bg_group1<- backgroundCorrect(pms_group1)
> norm_group1<- normalize(bg_group1)
>
> # I replace the pm values in the GeneFeatureSet all_groups by the normalized values of group 1:
> exprs(all_groups)[pmindex(all_groups), 1]<- norm_group1[,1]
> exprs(all_groups)[pmindex(all_groups), 2]<- norm_group1[,2]
> exprs(all_groups)[pmindex(all_groups), 3]<- norm_group1[,3]
>
> # after having done this for ALL the groups, I perform only the probe summary on all_groups:
> pp_all<- rma(all_groups, background = F, normalize = F, target = "core")
>
>
> I guess that fRMA together with fRMAtools would be an alternative for pre-processing my microarrays in small groups?
>
> Thank you very much in advance for warning me if my idea is wrong!
>
> Johanna Schott
>
>   -- output of sessionInfo():
>
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                    LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] mogene10sttranscriptcluster.db_8.0.1 org.Mm.eg.db_2.7.1                   AnnotationDbi_1.18.1                 Biobase_2.16.0
>   [5] BiocGenerics_0.2.0                   pd.mogene.1.0.st.v1_3.6.0            RSQLite_0.11.1                       DBI_0.2-5
>   [9] oligo_1.20.4                         oligoClasses_1.18.0
>
> loaded via a namespace (and not attached):
>   [1] affxparser_1.28.1     affyio_1.24.0         BiocInstaller_1.4.7   Biostrings_2.24.1     bit_1.1-8             codetools_0.2-8       ff_2.2-7              foreach_1.4.0
>   [9] IRanges_1.14.4        iterators_1.0.6       preprocessCore_1.18.0 splines_2.15.1        stats4_2.15.1         tools_2.15.1          zlibbioc_1.2.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099