[Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays
James W. MacDonald
jmacdon at med.umich.edu
Tue Feb 3 16:34:28 CET 2009
Hi Holger,
Holger Schwender wrote:
> Hi Wolfgang,
>
> the difference between the two outputs x1 and x2 are due to that in
> x1 the probe intensities are background-corrected first and then
> normalized, while in x2 they are first normalized and then
> background-corrected.
>
> The constant values are a feature of median polish: If median polish
> is applied to four or six samples, then this can lead to constant
> values for some of the probe sets. This has been mentioned years ago
> by Jim (I couldn't find his original email on this topic in the
> mailing list archive).
Actually this only occurs with a small, odd number of samples, not even.
Well, it could happen for a small number of probesets if the n/2 and n/2
+ 1 ordered values were identical, but it is much more common in the
case of 3 or 5 samples.
Best,
Jim
>
> Best, Holger
>
>
> -------- Original-Nachricht --------
>> Datum: Tue, 03 Feb 2009 15:38:06 +0100 Von: Wolfgang Huber
>> <huber at ebi.ac.uk> An: Bioconductor Developers
>> <bioc-devel at stat.math.ethz.ch> Betreff: [Bioc-devel] Bug in rma:
>> zero variance of probeset summaries across arrays
>
>> Hi,
>>
>> I am getting puzzling results from the rma function in the affy
>> package, and furthermore they are different depending on whether
>> quantile normalisation is called within rma or explicitely. The
>> puzzling part is that some probesets, in a perfectly ordinary
>> experiment with 4 different arrays, are summarized with exactly
>> identical values across all arrays (i.e. zero variance); while the
>> underlying probe level data for these probesets is variable and
>> sits right in the middle of the dynamic range (i.e. not at any of
>> the extremes).
>>
>> The code: ---------
>>
>> library("affy") library("genefilter") library("preprocessCore")
>> options(error=recover)
>>
>> files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL",
>> "PP_T2_2c.CEL")
>>
>> for(f in files)
>> download.file(file.path("http://www.ebi.ac.uk/~huber/pub", f),
>> file.path(tempdir(), f))
>>
>> a = ReadAffy(filenames=file.path(tempdir(), files))
>>
>> x1 = rma(a)
>>
>> aq = a exprs(aq) = normalize.quantiles(exprs(aq)) x2 = rma(aq,
>> normalize=FALSE)
>>
>> probeVars = function(x) { r = rowSds(exprs(x)) ps =
>> names(which(r==0)) cat(sprintf("%s: %d probes with zero
>> variance.\n", deparse(substitute(x)), length(ps))) return(ps) }
>>
>> probeVars(a) ps = probeVars(x1) probeVars(x2)
>>
>> print(exprs(a)[yeast2cdf[[ps[1]]][,1],])
>>
>> -----------------------------------------------------------------------------
>>
>>
>> The output: ----------- Background correcting Normalizing
>> Calculating Expression Background correcting Calculating Expression
>>
>>
>> a: 0 probes with zero variance. x1: 3 probes with zero variance.
>> x2: 0 probes with zero variance.
>>
>> PP_T1_1.CEL PP_T1_2.CEL PP_T2_1.CEL PP_T2_2c.CEL 35244 185
>> 236 269 253 198153 226 314
>> 320 213 48500 375 370 408
>> 373 109094 125 157 201 175 144686
>> 107 131 153 100 214110 181
>> 296 307 234 47302 355 430
>> 397 282 108656 117 130 150
>> 118 147528 158 144 159 135 237691
>> 208 244 253 169 229910 94
>> 138 147 133
>>
>>> sessionInfo()
>> R version 2.9.0 Under development (unstable) (2009-02-03 r47829)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
>>
>>
>> attached base packages: [1] stats graphics grDevices datasets
>> utils methods base
>>
>> other attached packages: [1] yeast2cdf_2.3.0
>> preprocessCore_1.5.3 genefilter_1.23.1 [4] affy_1.21.7
>> Biobase_2.3.10 fortunes_1.3-6
>>
>> loaded via a namespace (and not attached): [1] affyio_1.11.3
>> annotate_1.21.3 AnnotationDbi_1.5.14 [4] DBI_0.2-4
>> RSQLite_0.7-1 splines_2.9.0 [7] survival_2.34-1
>> tools_2.9.0 xtable_1.5-4
>>
>>
>>
>> Best wishes Wolfgang
>>
>>
>> PS While trying to dissect this, I also experimented with
>> constructing special CDF environments (e.g. one that had one
>> 'probeset' for each probe), and I noted that repeatedly the C code
>> of rma caused core dumps. Perhaps a better response to unexpected
>> shapes of CDF environments would be to issue an error message. I
>> can provide details if interested.
>>
>> ---------------------------------------------------- Wolfgang
>> Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
>>
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> --
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
More information about the Bioc-devel
mailing list