[Bioc-sig-seq] Slow/hanged QA on Illumina Data
Martin Morgan
mtmorgan at fhcrc.org
Fri Sep 11 00:16:06 CEST 2009
Pratap, Abhishek wrote:
> Hi Ivan
>
> I suspected it but not 100% sure. My % cpu for R process fluctuates btwn (60-100) and swap usage looks ok to me.
>
> I remember there was some talk on the mailing list that dev version (R/ShortRead) is a lot faster.
Hi Abhishek,
My experience is in the 5 minutes / lane range for qa, so it would seem
to be running a long time. The ... arguments to qa are passed to the
function that reads individual files (readAligned), so you can include a
verbose=TRUE argument for a bit more chat. You might write a short
script along the lines of
gcinfo(TRUE)
library(ShortRead)
dirPath <- "some/directory"
pattern <- "<some_pattern>"
stopifnot(list.files(dir, pattern) != <files I'm expecting>
qa <- qa(dirPath, pattern, type=<my type>, verbose=TRUE)
save(qa, file=<some file>)
Try running this from the command line
R -f MyScript.R
the gcinfo(TRUE) will cause R to start printing messages about
Garbage collection 3 = 2+0+1 (level 0) ...
7.4 Mbytes of cons cells used (39%)
1.3 Mbytes of vectors used (21%)
Garbage collection 4 = 3+0+1 (level 0) ...
10.3 Mbytes of cons cells used (55%)
which indicates that R is busy managing it's memory even before starting
to do real work. So give R more memory until it quiets down
R --min-nsize=20M --min-vsize=4G -f MyScript.R
(these values are my best guess at what is appropriate, the M is
'million', the 'G' Giga).
qa() should be reading one file at a time, so the memory requirement is
for the largest (product of reads and cycles) lane. You should be able
to get a handle on the size of that using readAligned().
How many reads and cycles are there in your data?
Martin
> Thanks,
> -Abhi
>
> -----Original Message-----
> From: Ivan Gregoretti [mailto:ivangreg at gmail.com]
> Sent: Thursday, September 10, 2009 5:01 PM
> To: Pratap, Abhishek
> Cc: Martin Morgan; bioc-sig-sequencing at r-project.org
> Subject: Re: [Bioc-sig-seq] Slow/hanged QA on Illumina Data
>
> It sounds like you may have run out of memory in your linux box.
>
> When I run qa() in my 16GB machine, it usually uses ~14GB just for
> this qa() process.
>
> That is for 36 bases. May be, it you are running 75 bases, you just
> used all the RAM.
>
> Is the processor running 100%? Check it issuing 'top' at the command
> line. If it is, then you are good.
>
> 'top' can also tell you is you are swapping wildly. (swapping is when
> your machine runs out of RAM memory and starts storing data in a
> temporary location in you hard drive to avoid crashing.)
>
> Ivan
>
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1592
> Fax: 1-301-496-9878
>
>> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-08-12 r49169)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
>
>
>
>
>
> On Thu, Sep 10, 2009 at 4:35 PM, Pratap, Abhishek
> <APratap at som.umaryland.edu> wrote:
>> Hi Martin
>>
>>
>>
>> I am noticing a lethargic or may be hanged processing with qa()
>> function in ShortRead. I know I have raised this question before.
>> Recently I have updated my R to dev version and installed latest
>> bioconductor.Currently I am trying to run qa() on 8 lanes of data for
>> 75 bp reads. The CPU is 16 cores with 16 GB RAM.
>>
>>
>>
>> It has been two hours since the processing has been going on. Is it
>> usually takes so long. I am not sure. Will using Rmpi help ?
>>
>>
>>
>>
>>
>> Thanks,
>>
>> -Abhi
>>
>>
>>
>> sessionInfo()
>>
>> R version 2.9.2 (2009-08-24)
>>
>> x86_64-unknown-linux-gnu
>>
>>
>>
>> locale:
>>
>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
>> TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=
>> C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI
>> ON=C
>>
>>
>>
>> attached base packages:
>>
>> [1] stats graphics grDevices utils datasets methods base
>>
>>
>>
>>
>> other attached packages:
>>
>> [1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3
>> Biostrings_2.12.8
>>
>> [5] IRanges_1.2.3
>>
>>
>>
>> loaded via a namespace (and not attached):
>>
>> [1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1
>>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
More information about the Bioc-sig-sequencing
mailing list