[BioC] Rsamtools Memory Issue

Martin Morgan mtmorgan at fhcrc.org
Fri Oct 1 22:54:36 CEST 2010


On 10/01/2010 01:12 PM, Hollis Wright wrote:
> Hello, all; I am having a problem with the readPileup() function in
> Rsamtools. I'm trying to read in a pileup file generated by SamTools
> of about ~4GB in size, on a Mac Pro, OS X 10.5. Top indicates that I
> have 18 GB of memory free, and I am using a freshly built R 2.11.1
> with the x86_64 arch specified. However, when I attempt to read in
> the file:
> 
> lane1 <- readPileup("test.pup", variant="SNP")
> 
> I eventually get a malloc error and the error "Cannot allocate vector
> of size 500 Mb". I have specified ulimit unlimited for the shell that
> I'm running R in and have difficulty believing that a 500MB
> contiguous space is unavailable in 18 GB of free RAM. Top only ever
> indicates that R is using 2-3GB;  Samtools has had no problems
> processing the files up to this point and a quick inspection seems to
> indicate that they are proper Pileup files. Any thoughts?

Hi Hollis --

Partly, the message is saying "I've allocated a bunch of memory, and now
I'm trying to allocate 500 more MB, and I can't find room for that
additional memory". That 2-3 GB use reported by top needs clarification;
it could be a mis-representation on the part of top, but it might also
be helpful to report sessionInfo() (I'm not a Mac person so can't
provide detail on 32 vs. 64 bit memory use...).

samtools does stream processing so doesn't run in to memory limits; this
is very different from the R programming model where data generally
resides in memory.

The code you execute ends up more or less directly at
Rsamtools:::.readPileup_SNP and Rsamtools:::.readPileup_table. These
rely on read.table to input the data, and the ... arguments available in
the original call are passed down to read.table. So you can select lines
to skip / limit the number of records read with the arguments 'skip' and
'nrows' as documented on ?read.table (samtools does produce multi-line
records, so an unfortunate choice of skip / nrows will begin / end in
the middle of a record; you could use read.table alone with similar
arguments to peak at the file to get the breaks right).

Martin

> 
> Hollis Wright, PhD Oregon Clinical and Translational Research
> Institute Oregon Health and Science University 
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch 
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list