[BioC] Excessive memory requirements of PING or bug?

Xuekui Zhang ubcxzhang at gmail.com
Mon May 21 19:09:36 CEST 2012


Hi Lars,

  Thanks for your feed back!
  Yes, cutting chromosome before segmentation is not preferred, since the cutting points might be where the peaks/nucleosomes are. 
  To avoid this problem, you could run a sliding window (e.g. window size 300bp, step size 10 bp) on a chromosome, count reads count in each window to find valley of reads counts curve and good cutting there.
  When we make the next version of PING, we could integrate cutting into segmentation step to avoid cutting chromosome on wrong place.

Xuekui

On May 20, 2012, at 3:25 AM, Lars Hennig wrote:

> Yes, I tried. Restricting to single chromosomes of ~ 20MB did not help but going to much smaller subchromosomal domains did eventually solve the problem. Still, this is not a preferred option to slice the genome into many small sectons.
> 
> Lars
> 
> -----Original Message-----
> From: Dan Tenenbaum [mailto:dtenenba at fhcrc.org] 
> Sent: Sunday, May 20, 2012 12:30 AM
> To: Xuekui Zhang
> Cc: Raphael Gottardo; Lars Hennig; Renan Sauteraud; bioconductor at r-project.org
> Subject: Re: [BioC] Excessive memory requirements of PING or bug?
> 
> [cc'ing Bioconductor list so others can benefit...]
> 
> On Sat, May 19, 2012 at 3:28 PM, Xuekui Zhang <ubcxzhang at gmail.com> wrote:
>> Hi Lars,
>> 
>>   Did you try to analyze each chromosome separately?
>>   Please let me know if that still can not solve the problem.
>> 
>> Xuekui
>> 
>> On May 19, 2012, at 5:35 PM, Raphael Gottardo wrote:
>> 
>> Hi Lars,
>> 
>> Xuekui ccied here will look into it.
>> 
>> Raphael
>> 
>> --
>> Raphael Gottardo, Associate Member
>> http://www.rglab.org
>> Fred Hutchinson Cancer Research Center Vaccine and Infectious Disease 
>> Division Public Health Sciences Division
>> 
>> 
>> 
>> On May 18, 2012, at 11:56 AM, Dan Tenenbaum wrote:
>> 
>> I'm cc'ing one of the PING maintainers who can perhaps shed more light 
>> on this.
>> Dan
>> 
>> 
>> On Thu, May 17, 2012 at 2:55 PM, Lars Hennig <Lars.Hennig at slu.se> wrote:
>> 
>> Dear PING maintainers,
>> 
>> 
>> Running PING with the example from the vignette works fine, but 
>> segmentReads causes a "cannot allocate memory block of size 
>> 68719476735.9 Gb" error when using my own ChIP-seq sample data. (16Mio 
>> paired end reads mapped with bowtie). This is an Arabidopsis sample (genome size = 130MB).
>> 
>> Using a sample of 100000 of our own reads runs smoothly again, 2.5 Mio 
>> crash with a similarly high memory request as mentioned above. 
>> Including snowfall or not has no effect.
>> 
>> 
>> Is there a way to trick PING into processing more than some few 100000 
>> reads with "normal" memory (I have 48 Gb available). If PING really 
>> has a very high memory need, this could be mentioned in the documentation.
>> 
>> 
>> Thank you very much,
>> 
>> 
>> Lars
>> 
>> 
>> Script:
>> 
>> 
>> library(ShortRead)
>> 
>> 
>> reads <- readAligned("reads_sorted.bam", type="BAM")
>> 
>> reads <- reads[!is.na(position(reads))]
>> 
>> reads <- reads[chromosome(reads) %in% c("Chr4")]
>> 
>> 
>> #reads <- reads[1:100000]
>> 
>> 
>> library(PING)
>> 
>> library(snowfall)
>> 
>> sfInit(parallel=TRUE,cpus=4)
>> 
>> sfLibrary(PING)
>> 
>> 
>> 
>> reads <- as(reads,"RangesList")
>> 
>> reads <- as(reads,"RangedData")
>> 
>> reads <- as(reads,"GenomeData")
>> 
>> 
>> seg <-segmentReads(reads, minReads=5, maxLregion=1200,minLregion=80,
>> jitter=T)
>> 
>> 
>> 
>> 
>> 
>> traceback()
>> 
>> 2: .Call("segReadsAll", data, dataC, start, end, as.integer(jitter),
>> 
>>       paraSW, as.integer(maxStep), as.integer(minLregion), PACKAGE = 
>> "PING")
>> 
>> 1: segmentReads(reads_gd, minReads = 5, maxLregion = 1200, minLregion 
>> = 80,
>> 
>>       jitter = T)
>> 
>> 
>> 
>> sessionInfo()
>> 
>> R version 2.15.0 (2012-03-30)
>> 
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> 
>> 
>> locale:
>> 
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> 
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> 
>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> 
>> [7] LC_PAPER=C                 LC_NAME=C
>> 
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> 
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> 
>> 
>> attached base packages:
>> 
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> 
>> 
>> other attached packages:
>> 
>> [1] snowfall_1.84       snow_0.3-9          PING_1.0.0
>> 
>> [4] chipseq_1.6.0       ShortRead_1.14.3    latticeExtra_0.6-19
>> 
>> [7] RColorBrewer_1.0-5  Rsamtools_1.8.4     lattice_0.20-6
>> 
>> [10] BSgenome_1.24.0     Biostrings_2.24.1   GenomicRanges_1.8.6
>> 
>> [13] IRanges_1.14.3      BiocGenerics_0.2.0
>> 
>> 
>> loaded via a namespace (and not attached):
>> 
>> [1] Biobase_2.16.0      biomaRt_2.12.0      bitops_1.0-4.1
>> 
>> [4] GenomeGraphs_1.16.0 grid_2.15.0         hwriter_1.3
>> 
>> [7] RCurl_1.91-1        stats4_2.15.0       tools_2.15.0
>> 
>> [10] XML_3.9-4           zlibbioc_1.2.0
>> 
>> 
>> 
>> Dr. Lars Hennig
>> 
>> Professor of Genetics
>> 
>> Swedish University of Agricultural Sciences
>> 
>> Uppsala BioCenter
>> 
>> Department of Plant Biology and Forest Genetics
>> 
>> PO-Box 7080
>> 
>> SE-75007 Uppsala, Sweden
>> 
>> Lars.Hennig at vbsg.slu.se
>> 
>> Tel. +46 18 67 3326
>> 
>> Fax  +46 18 67 3389
>> 
>> 
>> Visiting address:
>> 
>> Uppsala BioCenter
>> 
>> Almas Allé 5
>> 
>> SE-75651 Uppsala, Sweden
>> 
>> Room A-489
>> 
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> 
>> 
>> _______________________________________________
>> 
>> Bioconductor mailing list
>> 
>> Bioconductor at r-project.org
>> 
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> 
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> 
>> 



More information about the Bioconductor mailing list