[BioC] Memory problem with rma()

Steve Piccolo stephen.piccolo at hsc.utah.edu
Tue Feb 18 00:50:12 CET 2014


Hi Damian,

I receive the digest version of the BioC mailing list, so I apologize if
someone already gave this reply, but various Bioconductor packages are
designed for processing very large Affy data sets. Our own SCAN.UPC
package as well as the fRMA package normalize one sample at a time and
thus can be applied to data sets of any size. Another option would be the
aroma.affymetrix package, which is designed for doing memory-efficient RMA
normalization.

Hope that helps! If you end up trying SCAN.UPC, you might also try the
option for processing multiple samples in parallel, which you should be
able to do on a computer cluster.

Regards,
-Steve



On 2/17/14, 4:00 AM, "bioconductor-request at r-project.org"
<bioconductor-request at r-project.org> wrote:

>Date: Mon, 17 Feb 2014 00:39:48 -0300
>From: Benilton Carvalho <beniltoncarvalho at gmail.com>
>To: plichta at cbs.dtu.dk
>Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>,	Sean
>	Davis <sdavis2 at mail.nih.gov>
>Subject: Re: [BioC] Memory problem with rma()
>Message-ID:
>	<CAO-arWMyx1yNXv8oSnQa96=2peHxvvfdMojAM56brJ-wez-C_A at mail.gmail.com>
>Content-Type: text/plain
>
>Thanks, Damian,
>
>that's the indication that 'ff' hit the maximum limit in object
>dimensions... :-(
>
>Thanks for letting me know,
>
>b
>
>
>2014-02-17 0:22 GMT-03:00 <plichta at cbs.dtu.dk>:
>
>>Hi Benilton,
>>
>>I tried oligo and it choked:
>>
>>>...
>>>raw <- read.celfiles(cels)
>>
>>Loading required package: pd.huex.1.0.st.v2
>>Loading required package: RSQLite
>>Loading required package: DBI
>>Platform design info loaded.
>>Error in if (length < 0 || length > .Machine$integer.max) stop("length
>>must be between 1 and .Machine$integer.max") :
>>   missing value where TRUE/FALSE needed
>>In addition: Warning message:
>>In ff(initdata = initdata, vmode = vmode, dim = dim, pattern =
>>file.path(ldPath(),  :
>>   NAs introduced by coercion
>>
>>Do you know what does this error indicate?
>>
>>Thanks,
>>
>>Damian
>>
>>> Hi Damian,
>>>
>>> Soon, Christian should reply to you.
>>>
>>> In the meantime, for my personal interest and to define plans for the
>>> oligo
>>> package, would you be willing to try processing your set with oligo?
>>>
>>> library(ff)
>>> library(oligo)
>>> cels = list.celfiles()
>>> raw = read.celfiles(cels)
>>> res = rma(raw)
>>>
>>> If you have multiple cores available, before loading oligo, load a
>>> parallel
>>> front-end:
>>>
>>> library(doMC)
>>> registerDoMC(4)
>>>
>>> Let me know how it goes, if you have some time to spare...
>>>
>>> Thanks a million, benilton
>>> On Feb 16, 2014 7:15 PM, <plichta at cbs.dtu.dk> wrote:
>>>
>>>> I don't get a proper error message because I'm running the R session
>>>>in
>>>> an
>>>> interactive shell on a cluster (queuing system). When the memory limit
>>>> of
>>>> 8gb is reached, my interactive shell is terminated by the queuing
>>>> system.
>>>>
>>>> > And what was the actual error that you got?
>>>> >
>>>> > Sean
>>>> >
>>>> >
>>>> >
>>>> > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] <
>>>> > guest at bioconductor.org> wrote:
>>>> >
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> I am running rma() to correct, normalize and summarize a batch of
>>>>ca.
>>>> >> 5500
>>>> >> arrays. I have currently a memory limit of 8gb and the procedures
>>>> >> exceeds
>>>> >> that. I am guessing that it breaks at the background correction
>>>>step.
>>>> I
>>>> >> investigated the temporary directory and it's only file called
>>>> >> tmp_310151_rbg.root that was modified (size of that file is 16gb).
>>>> I
>>>> >> attached the code below.
>>>> >>
>>>> >> I tried the latest ROOT version and the one recommended at
>>>> bioconductor
>>>> >> (root_v5.34.14,root_v5.34.05).
>>>> >>
>>>> >> Any idea why is there the memory issue?
>>>> >>
>>>> >> scheme.HuEx <- import.exon.scheme(
>>>> >>                 filename = "Scheme_HuEx-1_0v2r2_hg19",
>>>> >>                 layoutfile =
>>>> "affyHuExome_design/HuEx-1_0-st-v2.r2.clf",
>>>> >>                 schemefile =
>>>> "affyHuExome_design/HuEx-1_0-st-v2.r2.pgf",
>>>> >>                 probeset =
>>>> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv",
>>>> >>                 transcript =
>>>> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.transcript.csv")
>>>> >>
>>>> >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root")
>>>> >>
>>>> >> data.HuEx <- import.data(
>>>> >>                 scheme.HuEx,
>>>> >>                 filename = "fhsCEL",
>>>> >>                 filedir = "normalizationXPS/",
>>>> >>                 celdir = "expression_CEL_raw/"
>>>> >>                 )
>>>> >>
>>>> >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root")
>>>> >>
>>>> >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile",
>>>> >>                 filedir="normalizationXPS",
>>>> >>                 tmpdir = "normalizationXPS/tmpDir",
>>>> >>                 add.data=FALSE, background="antigenomic",
>>>> >> normalize=TRUE,
>>>> >>                 option="transcript", exonlevel="core")
>>>> >>
>>>> >>
>>>> >>  -- output of sessionInfo():
>>>> >>
>>>> >> R version 3.0.2 (2013-09-25)
>>>> >> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>> >>
>>>> >> locale:
>>>> >>  [1] LC_CTYPE=C                 LC_NUMERIC=C
>>>> >>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>> >>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>> >>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>> >>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>> >>
>>>> >> attached base packages:
>>>> >> [1] stats     graphics  grDevices utils     datasets  methods
>>>>base
>>>> >>
>>>> >> other attached packages:
>>>> >> [1] xps_1.22.2
>>>> >>
>>>> >> loaded via a namespace (and not attached):
>>>> >> [1] tools_3.0.2
>>>> >>
>>>> >> --
>>>> >> Sent via the guest posting facility at bioconductor.org.
>>>> >>
>>>> >> _______________________________________________
>>>> >> Bioconductor mailing list
>>>> >> Bioconductor at r-project.org
>>>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> >> Search the archives:
>>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> >>
>>>> >
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>
>>
>>
>



More information about the Bioconductor mailing list