[Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays

Wolfgang Huber huber at ebi.ac.uk
Tue Feb 3 15:38:06 CET 2009


Hi,

I am getting puzzling results from the rma function in the affy package, 
and furthermore they are different depending on whether quantile 
normalisation is called within rma or explicitely. The puzzling part is 
that some probesets, in a perfectly ordinary experiment with 4 different 
arrays, are summarized with exactly identical values across all arrays 
(i.e. zero variance); while the underlying probe level data for these 
probesets is variable and sits right in the middle of the dynamic range 
(i.e. not at any of the extremes).

The code:
---------

library("affy")
library("genefilter")
library("preprocessCore")
options(error=recover)

files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL", "PP_T2_2c.CEL")

for(f in files)
   download.file(file.path("http://www.ebi.ac.uk/~huber/pub", f),
                 file.path(tempdir(), f))

a = ReadAffy(filenames=file.path(tempdir(), files))

x1 = rma(a)

aq = a
exprs(aq) = normalize.quantiles(exprs(aq))
x2 = rma(aq, normalize=FALSE)

probeVars = function(x) {
  r = rowSds(exprs(x))
  ps = names(which(r==0))
  cat(sprintf("%s: %d probes with zero variance.\n", 
deparse(substitute(x)), length(ps)))
  return(ps)
}

probeVars(a)
ps = probeVars(x1)
probeVars(x2)

print(exprs(a)[yeast2cdf[[ps[1]]][,1],])

-----------------------------------------------------------------------------

The output:
-----------
Background correcting
Normalizing
Calculating Expression
Background correcting
Calculating Expression

a: 0 probes with zero variance.
x1: 3 probes with zero variance.
x2: 0 probes with zero variance.

        PP_T1_1.CEL PP_T1_2.CEL PP_T2_1.CEL PP_T2_2c.CEL
35244          185         236         269          253
198153         226         314         320          213
48500          375         370         408          373
109094         125         157         201          175
144686         107         131         153          100
214110         181         296         307          234
47302          355         430         397          282
108656         117         130         150          118
147528         158         144         159          135
237691         208         244         253          169
229910          94         138         147          133

 > sessionInfo()
R version 2.9.0 Under development (unstable) (2009-02-03 r47829)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] yeast2cdf_2.3.0      preprocessCore_1.5.3 genefilter_1.23.1
[4] affy_1.21.7          Biobase_2.3.10       fortunes_1.3-6

loaded via a namespace (and not attached):
[1] affyio_1.11.3        annotate_1.21.3      AnnotationDbi_1.5.14
[4] DBI_0.2-4            RSQLite_0.7-1        splines_2.9.0
[7] survival_2.34-1      tools_2.9.0          xtable_1.5-4



Best wishes
      Wolfgang


PS While trying to dissect this, I also experimented with constructing 
special CDF environments (e.g. one that had one 'probeset' for each 
probe), and I noted that repeatedly the C code of rma caused core dumps. 
Perhaps a better response to unexpected shapes of CDF environments would 
be to issue an error message. I can provide details if interested.

----------------------------------------------------
Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber



More information about the Bioc-devel mailing list