[Bioc-devel] Bug in rma: zero variance of probeset summaries across arrays
Kasper Daniel Hansen
khansen at stat.berkeley.edu
Tue Feb 3 20:08:49 CET 2009
The rma command works fine on my system - fresh svn checkout of R-
devel from today, all relevant packages updated. I did not load
genefilter as you do below, but aside from that it is the same.
I have only tried once, and I ran it with R CMD BATCH.
2 comments:
1) It seems to complain about colnames and an object with no
dimension. Perhaps a drop = FALSE needs to be put in place somewhere
in the code (but why it works for me, I have no idea about)
2) In general, I would also have put a colnames = "pm" on the probeset
matrices you make below. Hard to know whether it is a requirement in
all parts of the code, but seems safer.
Kasper
On Feb 3, 2009, at 11:14 , Wolfgang Huber wrote:
> Thank you, Holger, Jim, Ben,
>
> (1) this is indeed a peculiar feature of the medianpolish part of
> RMA. I apologize to Ben for using the harsh word "bug"
> inappropriately in this context. I was misled by observations that
> using different normalisation or background correction methods
> seemed to "fix" the problem, and had forgotten about those previous
> threads. Sorry.
>
> I wonder whether anyone has done research on how to improve on
> medianpolish for small datasets and why people haven't adapted it.
>
> (2) One further potential problem remains, and I hope that bringing
> this up might help to make sure that rma (which is obviously a core
> component of Bioconductor) is robust even under the most adverse
> circumstances (e.g. myself :). With the script below, I get a core
> dump. This is on a freshly checked out and built R (from today) and
> all packages then installed from scratch (sessionInfo as in previous
> mail below). It does not seem to happen on R-2.8.1. and Bioc release
> 2.3., and even on R-dev, whether it happens seems to depend on the
> history of the R session. One hypothesis is that this could be
> related to the "destructive=TRUE" parameter of rma?
>
> Can anyone reproduce this?
>
> Best wishes
> Wolfgang
>
>
> library("affy")
> library("genefilter")
> options(error=recover)
>
> files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL", "PP_T2_2c.CEL")
>
> #celdir = tempdir()
> #for(f in files)
> # download.file(file.path("http://www.ebi.ac.uk/~huber/pub", f),
> # file.path(celdir, f))
> celdir="."
>
> a = ReadAffy(filenames=file.path(celdir, files))
>
> ## Define a CDF with one probeset per probe
> testcdf = new.env(hash=TRUE)
> for(i in seq_len(nrow(exprs(a))))
> assign(paste(i), matrix(i, ncol=1, nrow=1), envir=testcdf)
> a at cdfName = "testcdf"
>
> x = rma(a)
>
>
> ----------------------
>
> Background correcting
> Normalizing
> Calculating Expression
> Error in `colnames<-`(`*tmp*`, value = c("PP_T1_1.CEL",
> "PP_T1_2.CEL", :
> attempt to set colnames on object with less than two dimensions
>
> Enter a frame number, or 0 to exit
>
> 1: source("rma2.R")
> 2: eval.with.vis(ei, envir)
> 3: eval.with.vis(expr, envir, enclos)
> 4: rma(a)
> 5: `colnames<-`(`*tmp*`, value = c("PP_T1_1.CEL", "PP_T1_2.CEL",
> "PP_T2_1.CEL"
>
> Selection: 4
> Called from: eval(expr, envir, enclos)
> Browse[1]> ls()
> [1] "*tmp*" "background" "bgversion" "cols"
> "destructive"
> [6] "exprs" "ngenes" "normalize" "object" "pNList"
> [11] "rows" "subset" "verbose"
> Browse[1]> str(exprs)
>
> ........ long dump of text ....
>
>
> 496: stop("attempt to set colnames on object with less than two
> dimensions")
> 497: `colnames<-`(`*tmp*`, value = c("PP_T1_1.CEL", "PP_T1_2.CEL",
> "PP_T2_1.CEL", "PP_T2_2c.CEL"))
> 498: rma(a)
> 499: eval.with.vis(expr, envir, enclos)
> 500: eval.with.vis(ei, envir)
> 501: source("rma2.R")
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
> Selection:
>
>
> James W. MacDonald wrote:
>> Hi Holger,
>> Holger Schwender wrote:
>>> Hi Wolfgang,
>>>
>>> the difference between the two outputs x1 and x2 are due to that in
>>> x1 the probe intensities are background-corrected first and then
>>> normalized, while in x2 they are first normalized and then
>>> background-corrected.
>>>
>>> The constant values are a feature of median polish: If median polish
>>> is applied to four or six samples, then this can lead to constant
>>> values for some of the probe sets. This has been mentioned years ago
>>> by Jim (I couldn't find his original email on this topic in the
>>> mailing list archive).
>> Actually this only occurs with a small, odd number of samples, not
>> even. Well, it could happen for a small number of probesets if the
>> n/2 and n/2 + 1 ordered values were identical, but it is much more
>> common in the case of 3 or 5 samples.
>> Best,
>> Jim
>>>
>>> Best, Holger
>>>
>>>
>>> -------- Original-Nachricht --------
>>>> Datum: Tue, 03 Feb 2009 15:38:06 +0100 Von: Wolfgang Huber
>>>> <huber at ebi.ac.uk> An: Bioconductor Developers
>>>> <bioc-devel at stat.math.ethz.ch> Betreff: [Bioc-devel] Bug in rma:
>>>> zero variance of probeset summaries across arrays
>>>
>>>> Hi,
>>>>
>>>> I am getting puzzling results from the rma function in the affy
>>>> package, and furthermore they are different depending on whether
>>>> quantile normalisation is called within rma or explicitely. The
>>>> puzzling part is that some probesets, in a perfectly ordinary
>>>> experiment with 4 different arrays, are summarized with exactly
>>>> identical values across all arrays (i.e. zero variance); while the
>>>> underlying probe level data for these probesets is variable and
>>>> sits right in the middle of the dynamic range (i.e. not at any of
>>>> the extremes).
>>>>
>>>> The code: ---------
>>>>
>>>> library("affy") library("genefilter") library("preprocessCore")
>>>> options(error=recover)
>>>>
>>>> files = c("PP_T1_1.CEL", "PP_T1_2.CEL", "PP_T2_1.CEL",
>>>> "PP_T2_2c.CEL")
>>>>
>>>> for(f in files) download.file(file.path("http://www.ebi.ac.uk/~huber/pub
>>>> ", f), file.path(tempdir(), f))
>>>>
>>>> a = ReadAffy(filenames=file.path(tempdir(), files))
>>>>
>>>> x1 = rma(a)
>>>>
>>>> aq = a exprs(aq) = normalize.quantiles(exprs(aq)) x2 = rma(aq,
>>>> normalize=FALSE)
>>>>
>>>> probeVars = function(x) { r = rowSds(exprs(x)) ps =
>>>> names(which(r==0)) cat(sprintf("%s: %d probes with zero
>>>> variance.\n", deparse(substitute(x)), length(ps))) return(ps) }
>>>>
>>>> probeVars(a) ps = probeVars(x1) probeVars(x2)
>>>>
>>>> print(exprs(a)[yeast2cdf[[ps[1]]][,1],])
>>>>
>>>> -----------------------------------------------------------------------------
>>>>
>>>>
>>>> The output: ----------- Background correcting Normalizing
>>>> Calculating Expression Background correcting Calculating Expression
>>>>
>>>>
>>>> a: 0 probes with zero variance. x1: 3 probes with zero variance.
>>>> x2: 0 probes with zero variance.
>>>>
>>>> PP_T1_1.CEL PP_T1_2.CEL PP_T2_1.CEL PP_T2_2c.CEL 35244 185
>>>> 236 269 253 198153 226 314
>>>> 320 213 48500 375 370 408
>>>> 373 109094 125 157 201 175 144686
>>>> 107 131 153 100 214110 181
>>>> 296 307 234 47302 355 430
>>>> 397 282 108656 117 130 150
>>>> 118 147528 158 144 159 135 237691
>>>> 208 244 253 169 229910 94
>>>> 138 147 133
>>>>
>>>>> sessionInfo()
>>>> R version 2.9.0 Under development (unstable) (2009-02-03 r47829)
>>>> x86_64-unknown-linux-gnu
>>>>
>>>> locale:
>>>> LC_CTYPE
>>>> =
>>>> it_IT
>>>> .UTF
>>>> -8
>>>> ;LC_NUMERIC
>>>> =
>>>> C
>>>> ;LC_TIME
>>>> =
>>>> it_IT
>>>> .UTF
>>>> -8
>>>> ;LC_COLLATE
>>>> =
>>>> it_IT
>>>> .UTF
>>>> -8
>>>> ;LC_MONETARY
>>>> =
>>>> C
>>>> ;LC_MESSAGES
>>>> =
>>>> it_IT
>>>> .UTF
>>>> -8
>>>> ;LC_PAPER
>>>> =
>>>> it_IT
>>>> .UTF
>>>> -8
>>>> ;LC_NAME
>>>> =
>>>> C
>>>> ;LC_ADDRESS
>>>> =C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
>>>>
>>>>
>>>> attached base packages: [1] stats graphics grDevices datasets
>>>> utils methods base
>>>>
>>>> other attached packages: [1] yeast2cdf_2.3.0
>>>> preprocessCore_1.5.3 genefilter_1.23.1 [4] affy_1.21.7
>>>> Biobase_2.3.10 fortunes_1.3-6
>>>>
>>>> loaded via a namespace (and not attached): [1] affyio_1.11.3
>>>> annotate_1.21.3 AnnotationDbi_1.5.14 [4] DBI_0.2-4
>>>> RSQLite_0.7-1 splines_2.9.0 [7] survival_2.34-1
>>>> tools_2.9.0 xtable_1.5-4
>>>>
>>>>
>>>>
>>>> Best wishes Wolfgang
>>>>
>>>>
>>>> PS While trying to dissect this, I also experimented with
>>>> constructing special CDF environments (e.g. one that had one
>>>> 'probeset' for each probe), and I noted that repeatedly the C code
>>>> of rma caused core dumps. Perhaps a better response to unexpected
>>>> shapes of CDF environments would be to issue an error message. I
>>>> can provide details if interested.
>>>>
>>>> ---------------------------------------------------- Wolfgang
>>>> Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
>>>>
>>>> _______________________________________________ Bioc-devel at stat.math.ethz.ch
>>>> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> --
>>>
>>> _______________________________________________ Bioc-devel at stat.math.ethz.ch
>>> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> --
> ----------------------------------------------------
> Wolfgang Huber EMBL-EBI http://www.ebi.ac.uk/huber
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list