[Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays
Wolfgang Huber
huber at ebi.ac.uk
Tue Feb 3 15:38:06 CET 2009
Hi,
I am getting puzzling results from the rma function in the affy package,
and furthermore they are different depending on whether quantile
normalisation is called within rma or explicitely. The puzzling part is
that some probesets, in a perfectly ordinary experiment with 4 different
arrays, are summarized with exactly identical values across all arrays
(i.e. zero variance); while the underlying probe level data for these
probesets is variable and sits right in the middle of the dynamic range
(i.e. not at any of the extremes).
The code:
---------
library("affy")
library("genefilter")
library("preprocessCore")
options(error=recover)
files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL", "PP_T2_2c.CEL")
for(f in files)
download.file(file.path("http://www.ebi.ac.uk/~huber/pub", f),
file.path(tempdir(), f))
a = ReadAffy(filenames=file.path(tempdir(), files))
x1 = rma(a)
aq = a
exprs(aq) = normalize.quantiles(exprs(aq))
x2 = rma(aq, normalize=FALSE)
probeVars = function(x) {
r = rowSds(exprs(x))
ps = names(which(r==0))
cat(sprintf("%s: %d probes with zero variance.\n",
deparse(substitute(x)), length(ps)))
return(ps)
}
probeVars(a)
ps = probeVars(x1)
probeVars(x2)
print(exprs(a)[yeast2cdf[[ps[1]]][,1],])
-----------------------------------------------------------------------------
The output:
-----------
Background correcting
Normalizing
Calculating Expression
Background correcting
Calculating Expression
a: 0 probes with zero variance.
x1: 3 probes with zero variance.
x2: 0 probes with zero variance.
PP_T1_1.CEL PP_T1_2.CEL PP_T2_1.CEL PP_T2_2c.CEL
35244 185 236 269 253
198153 226 314 320 213
48500 375 370 408 373
109094 125 157 201 175
144686 107 131 153 100
214110 181 296 307 234
47302 355 430 397 282
108656 117 130 150 118
147528 158 144 159 135
237691 208 244 253 169
229910 94 138 147 133
> sessionInfo()
R version 2.9.0 Under development (unstable) (2009-02-03 r47829)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] yeast2cdf_2.3.0 preprocessCore_1.5.3 genefilter_1.23.1
[4] affy_1.21.7 Biobase_2.3.10 fortunes_1.3-6
loaded via a namespace (and not attached):
[1] affyio_1.11.3 annotate_1.21.3 AnnotationDbi_1.5.14
[4] DBI_0.2-4 RSQLite_0.7-1 splines_2.9.0
[7] survival_2.34-1 tools_2.9.0 xtable_1.5-4
Best wishes
Wolfgang
PS While trying to dissect this, I also experimented with constructing
special CDF environments (e.g. one that had one 'probeset' for each
probe), and I noted that repeatedly the C code of rma caused core dumps.
Perhaps a better response to unexpected shapes of CDF environments would
be to issue an error message. I can provide details if interested.
----------------------------------------------------
Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
More information about the Bioc-devel
mailing list