[Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays

Ben Bolstad bmb at bmbolstad.com
Tue Feb 3 16:33:49 CET 2009


Indeed it is a feature of median polish. See

https://stat.ethz.ch/pipermail/bioconductor/2004-June/004941.html

https://stat.ethz.ch/pipermail/bioconductor/2003-September/002499.html



On Tue, 2009-02-03 at 16:18 +0100, Holger Schwender wrote:
> Hi Wolfgang,
> 
> the difference between the two outputs x1 and x2 are due to that in x1 the probe intensities are background-corrected first and then normalized, while in x2 they are first normalized and then background-corrected.
> 
> The constant values are a feature of median polish: If median polish is applied to four or six samples, then this can lead to constant values for some of the probe sets. This has been mentioned years ago by Jim (I couldn't find his original email on this topic in the mailing list archive).
> 
> Best,
> Holger
> 
> 
> -------- Original-Nachricht --------
> > Datum: Tue, 03 Feb 2009 15:38:06 +0100
> > Von: Wolfgang Huber <huber at ebi.ac.uk>
> > An: Bioconductor Developers <bioc-devel at stat.math.ethz.ch>
> > Betreff: [Bioc-devel] Bug in rma: zero variance of probeset summaries across	arrays
> 
> > Hi,
> > 
> > I am getting puzzling results from the rma function in the affy package, 
> > and furthermore they are different depending on whether quantile 
> > normalisation is called within rma or explicitely. The puzzling part is 
> > that some probesets, in a perfectly ordinary experiment with 4 different 
> > arrays, are summarized with exactly identical values across all arrays 
> > (i.e. zero variance); while the underlying probe level data for these 
> > probesets is variable and sits right in the middle of the dynamic range 
> > (i.e. not at any of the extremes).
> > 
> > The code:
> > ---------
> > 
> > library("affy")
> > library("genefilter")
> > library("preprocessCore")
> > options(error=recover)
> > 
> > files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL", "PP_T2_2c.CEL")
> > 
> > for(f in files)
> >    download.file(file.path("http://www.ebi.ac.uk/~huber/pub", f),
> >                  file.path(tempdir(), f))
> > 
> > a = ReadAffy(filenames=file.path(tempdir(), files))
> > 
> > x1 = rma(a)
> > 
> > aq = a
> > exprs(aq) = normalize.quantiles(exprs(aq))
> > x2 = rma(aq, normalize=FALSE)
> > 
> > probeVars = function(x) {
> >   r = rowSds(exprs(x))
> >   ps = names(which(r==0))
> >   cat(sprintf("%s: %d probes with zero variance.\n", 
> > deparse(substitute(x)), length(ps)))
> >   return(ps)
> > }
> > 
> > probeVars(a)
> > ps = probeVars(x1)
> > probeVars(x2)
> > 
> > print(exprs(a)[yeast2cdf[[ps[1]]][,1],])
> > 
> > -----------------------------------------------------------------------------
> > 
> > The output:
> > -----------
> > Background correcting
> > Normalizing
> > Calculating Expression
> > Background correcting
> > Calculating Expression
> > 
> > a: 0 probes with zero variance.
> > x1: 3 probes with zero variance.
> > x2: 0 probes with zero variance.
> > 
> >         PP_T1_1.CEL PP_T1_2.CEL PP_T2_1.CEL PP_T2_2c.CEL
> > 35244          185         236         269          253
> > 198153         226         314         320          213
> > 48500          375         370         408          373
> > 109094         125         157         201          175
> > 144686         107         131         153          100
> > 214110         181         296         307          234
> > 47302          355         430         397          282
> > 108656         117         130         150          118
> > 147528         158         144         159          135
> > 237691         208         244         253          169
> > 229910          94         138         147          133
> > 
> >  > sessionInfo()
> > R version 2.9.0 Under development (unstable) (2009-02-03 r47829)
> > x86_64-unknown-linux-gnu
> > 
> > locale:
> > LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
> > 
> > attached base packages:
> > [1] stats     graphics  grDevices datasets  utils     methods   base
> > 
> > other attached packages:
> > [1] yeast2cdf_2.3.0      preprocessCore_1.5.3 genefilter_1.23.1
> > [4] affy_1.21.7          Biobase_2.3.10       fortunes_1.3-6
> > 
> > loaded via a namespace (and not attached):
> > [1] affyio_1.11.3        annotate_1.21.3      AnnotationDbi_1.5.14
> > [4] DBI_0.2-4            RSQLite_0.7-1        splines_2.9.0
> > [7] survival_2.34-1      tools_2.9.0          xtable_1.5-4
> > 
> > 
> > 
> > Best wishes
> >       Wolfgang
> > 
> > 
> > PS While trying to dissect this, I also experimented with constructing 
> > special CDF environments (e.g. one that had one 'probeset' for each 
> > probe), and I noted that repeatedly the C code of rma caused core dumps. 
> > Perhaps a better response to unexpected shapes of CDF environments would 
> > be to issue an error message. I can provide details if interested.
> > 
> > ----------------------------------------------------
> > Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
> > 
> > _______________________________________________
> > Bioc-devel at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> --
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list