[Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays
Ben Bolstad
bmb at bmbolstad.com
Tue Feb 3 16:33:49 CET 2009
Indeed it is a feature of median polish. See
https://stat.ethz.ch/pipermail/bioconductor/2004-June/004941.html
https://stat.ethz.ch/pipermail/bioconductor/2003-September/002499.html
On Tue, 2009-02-03 at 16:18 +0100, Holger Schwender wrote:
> Hi Wolfgang,
>
> the difference between the two outputs x1 and x2 are due to that in x1 the probe intensities are background-corrected first and then normalized, while in x2 they are first normalized and then background-corrected.
>
> The constant values are a feature of median polish: If median polish is applied to four or six samples, then this can lead to constant values for some of the probe sets. This has been mentioned years ago by Jim (I couldn't find his original email on this topic in the mailing list archive).
>
> Best,
> Holger
>
>
> -------- Original-Nachricht --------
> > Datum: Tue, 03 Feb 2009 15:38:06 +0100
> > Von: Wolfgang Huber <huber at ebi.ac.uk>
> > An: Bioconductor Developers <bioc-devel at stat.math.ethz.ch>
> > Betreff: [Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays
>
> > Hi,
> >
> > I am getting puzzling results from the rma function in the affy package,
> > and furthermore they are different depending on whether quantile
> > normalisation is called within rma or explicitely. The puzzling part is
> > that some probesets, in a perfectly ordinary experiment with 4 different
> > arrays, are summarized with exactly identical values across all arrays
> > (i.e. zero variance); while the underlying probe level data for these
> > probesets is variable and sits right in the middle of the dynamic range
> > (i.e. not at any of the extremes).
> >
> > The code:
> > ---------
> >
> > library("affy")
> > library("genefilter")
> > library("preprocessCore")
> > options(error=recover)
> >
> > files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL", "PP_T2_2c.CEL")
> >
> > for(f in files)
> > download.file(file.path("http://www.ebi.ac.uk/~huber/pub", f),
> > file.path(tempdir(), f))
> >
> > a = ReadAffy(filenames=file.path(tempdir(), files))
> >
> > x1 = rma(a)
> >
> > aq = a
> > exprs(aq) = normalize.quantiles(exprs(aq))
> > x2 = rma(aq, normalize=FALSE)
> >
> > probeVars = function(x) {
> > r = rowSds(exprs(x))
> > ps = names(which(r==0))
> > cat(sprintf("%s: %d probes with zero variance.\n",
> > deparse(substitute(x)), length(ps)))
> > return(ps)
> > }
> >
> > probeVars(a)
> > ps = probeVars(x1)
> > probeVars(x2)
> >
> > print(exprs(a)[yeast2cdf[[ps[1]]][,1],])
> >
> > -----------------------------------------------------------------------------
> >
> > The output:
> > -----------
> > Background correcting
> > Normalizing
> > Calculating Expression
> > Background correcting
> > Calculating Expression
> >
> > a: 0 probes with zero variance.
> > x1: 3 probes with zero variance.
> > x2: 0 probes with zero variance.
> >
> > PP_T1_1.CEL PP_T1_2.CEL PP_T2_1.CEL PP_T2_2c.CEL
> > 35244 185 236 269 253
> > 198153 226 314 320 213
> > 48500 375 370 408 373
> > 109094 125 157 201 175
> > 144686 107 131 153 100
> > 214110 181 296 307 234
> > 47302 355 430 397 282
> > 108656 117 130 150 118
> > 147528 158 144 159 135
> > 237691 208 244 253 169
> > 229910 94 138 147 133
> >
> > > sessionInfo()
> > R version 2.9.0 Under development (unstable) (2009-02-03 r47829)
> > x86_64-unknown-linux-gnu
> >
> > locale:
> > LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics grDevices datasets utils methods base
> >
> > other attached packages:
> > [1] yeast2cdf_2.3.0 preprocessCore_1.5.3 genefilter_1.23.1
> > [4] affy_1.21.7 Biobase_2.3.10 fortunes_1.3-6
> >
> > loaded via a namespace (and not attached):
> > [1] affyio_1.11.3 annotate_1.21.3 AnnotationDbi_1.5.14
> > [4] DBI_0.2-4 RSQLite_0.7-1 splines_2.9.0
> > [7] survival_2.34-1 tools_2.9.0 xtable_1.5-4
> >
> >
> >
> > Best wishes
> > Wolfgang
> >
> >
> > PS While trying to dissect this, I also experimented with constructing
> > special CDF environments (e.g. one that had one 'probeset' for each
> > probe), and I noted that repeatedly the C code of rma caused core dumps.
> > Perhaps a better response to unexpected shapes of CDF environments would
> > be to issue an error message. I can provide details if interested.
> >
> > ----------------------------------------------------
> > Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
> >
> > _______________________________________________
> > Bioc-devel at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> --
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list