[BioC] running time: countOverlaps & summarizedOverlaps vs. HTSeq

Nicolas Delhomme delhomme at embl.de
Wed Mar 28 13:21:11 CEST 2012


Hi Milica,

I do the exact same thing in my package (easyRNASeq, still in the devel branch of Bioc) and it definitely does not require 20 hours to read "only" 20 million reads. Are you sure you are not getting your machine to swap? I.e. did you monitor the memory usage? 
 
It would be interesting (for me, at least) if you could try my package to get your count table. You can either retrieve the annotation from biomaRt or provide a GFF file. See the vignette of the package for the details and maybe these two posts on that mailing list:

https://stat.ethz.ch/pipermail/bioconductor/2012-February/043478.html
https://mailman.stat.ethz.ch/pipermail/bioconductor/2012-March/044124.html

For addressing if from the countOverlaps / summarizedOverlaps point of view, it would help if you could post your code and sessionInfo().

HTH,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------





On 28 Mar 2012, at 13:04, Milica Krunic wrote:

> Hello!
> 
> 
> 
> I am working with cat RNA Seq data and after mapping I wanted to get the
> count tables. So, I tried to do it using countOverlaps and
> summarizedOverlaps in R and HTSeq in python. I've noticed that using R, per
> one sorted .bam file (~20*10^6 reads), no matter which previously mentioned
> method I used, it takes ~20 hours. In python, it takes ~15 minutes. For R
> methods I used GRangesList object downloaded directly in R from Biomart. In
> HTSeq I used GTF file provided on Ensembl homepage. Average  cat gene width
> is about 44000 in GRangesList.
> Does anyone know why getting count tables in R takes so long compared to
> HTSeq?
> 
> 
> Thank you!
> 
> Best,
> Milica
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list