[Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays

Wolfgang Huber huber at ebi.ac.uk
Tue Feb 3 16:13:04 CET 2009


Hi

I realised that x1 and x2 below cannot be expected to be the same, since
background correction and quantile normalisation are done in different
orders when they are produced.

My puzzlement about the many probesets that get assigned identical
summary values across arrays persists. Any hints about how to further
dissect this problem would be welcome!

	Cheers
	Wolfgang

03/02/2009 14:38 Wolfgang Huber scripsit
> Hi,
> 
> I am getting puzzling results from the rma function in the affy package,
> and furthermore they are different depending on whether quantile
> normalisation is called within rma or explicitely. The puzzling part is
> that some probesets, in a perfectly ordinary experiment with 4 different
> arrays, are summarized with exactly identical values across all arrays
> (i.e. zero variance); while the underlying probe level data for these
> probesets is variable and sits right in the middle of the dynamic range
> (i.e. not at any of the extremes).
> 
> The code:
> ---------
> 
> library("affy")
> library("genefilter")
> library("preprocessCore")
> options(error=recover)
> 
> files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL", "PP_T2_2c.CEL")
> 
> for(f in files)
>   download.file(file.path("http://www.ebi.ac.uk/~huber/pub", f),
>                 file.path(tempdir(), f))
> 
> a = ReadAffy(filenames=file.path(tempdir(), files))
> 
> x1 = rma(a)
> 
> aq = a
> exprs(aq) = normalize.quantiles(exprs(aq))
> x2 = rma(aq, normalize=FALSE)
> 
> probeVars = function(x) {
>  r = rowSds(exprs(x))
>  ps = names(which(r==0))
>  cat(sprintf("%s: %d probes with zero variance.\n",
> deparse(substitute(x)), length(ps)))
>  return(ps)
> }
> 
> probeVars(a)
> ps = probeVars(x1)
> probeVars(x2)
> 
> print(exprs(a)[yeast2cdf[[ps[1]]][,1],])
> 
> -----------------------------------------------------------------------------
> 
> 
> The output:
> -----------
> Background correcting
> Normalizing
> Calculating Expression
> Background correcting
> Calculating Expression
> 
> a: 0 probes with zero variance.
> x1: 3 probes with zero variance.
> x2: 0 probes with zero variance.
> 
>        PP_T1_1.CEL PP_T1_2.CEL PP_T2_1.CEL PP_T2_2c.CEL
> 35244          185         236         269          253
> 198153         226         314         320          213
> 48500          375         370         408          373
> 109094         125         157         201          175
> 144686         107         131         153          100
> 214110         181         296         307          234
> 47302          355         430         397          282
> 108656         117         130         150          118
> 147528         158         144         159          135
> 237691         208         244         253          169
> 229910          94         138         147          133
> 
>> sessionInfo()
> R version 2.9.0 Under development (unstable) (2009-02-03 r47829)
> x86_64-unknown-linux-gnu
> 
> locale:
> LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
> 
> 
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
> 
> other attached packages:
> [1] yeast2cdf_2.3.0      preprocessCore_1.5.3 genefilter_1.23.1
> [4] affy_1.21.7          Biobase_2.3.10       fortunes_1.3-6
> 
> loaded via a namespace (and not attached):
> [1] affyio_1.11.3        annotate_1.21.3      AnnotationDbi_1.5.14
> [4] DBI_0.2-4            RSQLite_0.7-1        splines_2.9.0
> [7] survival_2.34-1      tools_2.9.0          xtable_1.5-4
> 
> 
> 
> Best wishes
>      Wolfgang
> 
> 
> PS While trying to dissect this, I also experimented with constructing
> special CDF environments (e.g. one that had one 'probeset' for each
> probe), and I noted that repeatedly the C code of rma caused core dumps.
> Perhaps a better response to unexpected shapes of CDF environments would
> be to issue an error message. I can provide details if interested.
> 
> ----------------------------------------------------
> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list