[BioC] Memory problem with rma()
cstrato
cstrato at aon.at
Mon Feb 17 15:23:29 CET 2014
Dear Damian,
In principle you should not have a memory problem, however 5500 exon
arrays is quite a lot, thus let me propose the following:
1. Do not run function rma() directly, but do it stepwise, i.e.:
data.bg.rma <- bgcorrect.rma(data.exon, ...)
data.qu.rma <- normalize.quantiles(data.bg.rma, ...)
data.mp.rma <- summarize.rma(data.qu.rma, ...)
You can find an example in script examples/script4exon.R (at line 750).
In this way you will not loose all your computation if anything goes
wrong at one step.
Maybe you will also need to to set 'add.data=FALSE' in summarize.rma()
otherwise all expression data will be imported causing a memory problem,
too.
Another way to run rma() stepwise is to use function express(), see
example in script examples/script4exon.R (at line 785). When using
function express you could set parameter 'bufsize=4000', which will
reduce the basket size for each tree, thus consuming less RAM.
2. I would suggest to use first only 6 exon arrays to see if everything
works fine, then I would try to run 50 exon arrays to see if
- there is an initial memory problem
- to estimate how long each step needs if you run all 5500 arrays
(approximately time x 110)
3. Please run everything with 'verbose=TRUE' so that you can see the
output interactively. Maybe you could pipe the output to a text file.
4. Since you assume that there may be a memory problem: maybe you can
run top (or something else) and check RSIZE/VSIZE from time to time.
Maybe you can create a script which export the memory consumption e.g.
every 10 min.
4. I am not sure if running the code on a cluster is a good idea.
Do you run your code on a node which is exclusively used for this
purpose only?
My suggestion would be to run your code on a machine where nothing else
is running, since I assume that for 5500 exon arrays you will need at
least one week (but see point 2).
(Note: In 2009 a customer was running 23000 HGU-133_Plus2 arrays on a
machine and with his help I could eliminate (hopefully) all memory
problems, some of which appeared after 2000 arrays only. In his case
memory consumption initially increased to 7.8 GB but after solving the
memory problems memory consumption remained at 3.0 GB.)
Best regards,
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
V.i.e.n.n.a A.u.s.t.r.i.a
e.m.a.i.l: cstrato at aon.at
_._._._._._._._._._._._._._._._._._
On 2/16/14 8:07 PM, Damian Plichta [guest] wrote:
> Hi,
>
> I am running rma() to correct, normalize and summarize a batch of ca. 5500 arrays. I have currently a memory limit of 8gb and the procedures exceeds that. I am guessing that it breaks at the background correction step. I investigated the temporary directory and it's only file called tmp_310151_rbg.root that was modified (size of that file is 16gb). I attached the code below.
>
> I tried the latest ROOT version and the one recommended at bioconductor (root_v5.34.14,root_v5.34.05).
>
> Any idea why is there the memory issue?
>
> scheme.HuEx <- import.exon.scheme(
> filename = "Scheme_HuEx-1_0v2r2_hg19",
> layoutfile = "affyHuExome_design/HuEx-1_0-st-v2.r2.clf",
> schemefile = "affyHuExome_design/HuEx-1_0-st-v2.r2.pgf",
> probeset = "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv",
> transcript = "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.transcript.csv")
>
> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root")
>
> data.HuEx <- import.data(
> scheme.HuEx,
> filename = "fhsCEL",
> filedir = "normalizationXPS/",
> celdir = "expression_CEL_raw/"
> )
>
> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root")
>
> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile",
> filedir="normalizationXPS",
> tmpdir = "normalizationXPS/tmpDir",
> add.data=FALSE, background="antigenomic", normalize=TRUE,
> option="transcript", exonlevel="core")
>
>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=C LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] xps_1.22.2
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
More information about the Bioconductor
mailing list