[Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays

James W. MacDonald jmacdon at med.umich.edu
Tue Feb 3 16:34:28 CET 2009


Hi Holger,

Holger Schwender wrote:
> Hi Wolfgang,
> 
> the difference between the two outputs x1 and x2 are due to that in
> x1 the probe intensities are background-corrected first and then
> normalized, while in x2 they are first normalized and then
> background-corrected.
> 
> The constant values are a feature of median polish: If median polish
> is applied to four or six samples, then this can lead to constant
> values for some of the probe sets. This has been mentioned years ago
> by Jim (I couldn't find his original email on this topic in the
> mailing list archive).

Actually this only occurs with a small, odd number of samples, not even. 
Well, it could happen for a small number of probesets if the n/2 and n/2 
+ 1 ordered values were identical, but it is much more common in the 
case of 3 or 5 samples.

Best,

Jim


> 
> Best, Holger
> 
> 
> -------- Original-Nachricht --------
>> Datum: Tue, 03 Feb 2009 15:38:06 +0100 Von: Wolfgang Huber
>> <huber at ebi.ac.uk> An: Bioconductor Developers
>> <bioc-devel at stat.math.ethz.ch> Betreff: [Bioc-devel] Bug in rma:
>> zero variance of probeset summaries across	arrays
> 
>> Hi,
>> 
>> I am getting puzzling results from the rma function in the affy
>> package, and furthermore they are different depending on whether
>> quantile normalisation is called within rma or explicitely. The
>> puzzling part is that some probesets, in a perfectly ordinary
>> experiment with 4 different arrays, are summarized with exactly
>> identical values across all arrays (i.e. zero variance); while the
>> underlying probe level data for these probesets is variable and
>> sits right in the middle of the dynamic range (i.e. not at any of
>> the extremes).
>> 
>> The code: ---------
>> 
>> library("affy") library("genefilter") library("preprocessCore") 
>> options(error=recover)
>> 
>> files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL",
>> "PP_T2_2c.CEL")
>> 
>> for(f in files) 
>> download.file(file.path("http://www.ebi.ac.uk/~huber/pub", f), 
>> file.path(tempdir(), f))
>> 
>> a = ReadAffy(filenames=file.path(tempdir(), files))
>> 
>> x1 = rma(a)
>> 
>> aq = a exprs(aq) = normalize.quantiles(exprs(aq)) x2 = rma(aq,
>> normalize=FALSE)
>> 
>> probeVars = function(x) { r = rowSds(exprs(x)) ps =
>> names(which(r==0)) cat(sprintf("%s: %d probes with zero
>> variance.\n", deparse(substitute(x)), length(ps))) return(ps) }
>> 
>> probeVars(a) ps = probeVars(x1) probeVars(x2)
>> 
>> print(exprs(a)[yeast2cdf[[ps[1]]][,1],])
>> 
>> -----------------------------------------------------------------------------
>> 
>> 
>> The output: ----------- Background correcting Normalizing 
>> Calculating Expression Background correcting Calculating Expression
>> 
>> 
>> a: 0 probes with zero variance. x1: 3 probes with zero variance. 
>> x2: 0 probes with zero variance.
>> 
>> PP_T1_1.CEL PP_T1_2.CEL PP_T2_1.CEL PP_T2_2c.CEL 35244          185
>> 236         269          253 198153         226         314
>> 320          213 48500          375         370         408
>> 373 109094         125         157         201          175 144686
>> 107         131         153          100 214110         181
>> 296         307          234 47302          355         430
>> 397          282 108656         117         130         150
>> 118 147528         158         144         159          135 237691
>> 208         244         253          169 229910          94
>> 138         147          133
>> 
>>> sessionInfo()
>> R version 2.9.0 Under development (unstable) (2009-02-03 r47829) 
>> x86_64-unknown-linux-gnu
>> 
>> locale: 
>> LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
>> 
>> 
>> attached base packages: [1] stats     graphics  grDevices datasets
>> utils     methods   base
>> 
>> other attached packages: [1] yeast2cdf_2.3.0
>> preprocessCore_1.5.3 genefilter_1.23.1 [4] affy_1.21.7
>> Biobase_2.3.10       fortunes_1.3-6
>> 
>> loaded via a namespace (and not attached): [1] affyio_1.11.3
>> annotate_1.21.3      AnnotationDbi_1.5.14 [4] DBI_0.2-4
>> RSQLite_0.7-1        splines_2.9.0 [7] survival_2.34-1
>> tools_2.9.0          xtable_1.5-4
>> 
>> 
>> 
>> Best wishes Wolfgang
>> 
>> 
>> PS While trying to dissect this, I also experimented with
>> constructing special CDF environments (e.g. one that had one
>> 'probeset' for each probe), and I noted that repeatedly the C code
>> of rma caused core dumps. Perhaps a better response to unexpected
>> shapes of CDF environments would be to issue an error message. I
>> can provide details if interested.
>> 
>> ---------------------------------------------------- Wolfgang
>> Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
>> 
>> _______________________________________________ 
>> Bioc-devel at stat.math.ethz.ch mailing list 
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> --
> 
> _______________________________________________ 
> Bioc-devel at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662



More information about the Bioc-devel mailing list