[BioC] [devteam-bioc] readGAlignmentPairs perfromace issue
Hervé Pagès
hpages at fhcrc.org
Tue May 20 21:27:52 CEST 2014
Hi Phil,
I don't have access to your BAM file but here are the timings I get for
readGAlignmentPairs(). (My file contains 100,000,000 pairs but I use
'which' to load only pairs located on chr1-4 so the result contains only
16,938,029 pairs):
- with BioC 2.13:
user system elapsed
439.784 30.218 470.136
- with BioC 2.14:
user system elapsed
319.212 11.492 331.201
So the new code is about 40% faster for me (it also uses about 20% less
memory).
The timings you report below with BioC 2.14 for loading 108,592,829
pairs look reasonable to me. What is really surprising is the timing
you get with BioC 2.13: only 208s to load 108,592,829 pairs! This is
15x faster than with BioC 2.14! Do you confirm this? If so, would you
mind making the file accessible to us so we can have a look at it?
Thanks,
H.
On 05/20/2014 06:31 AM, Maintainer wrote:
> Hi Valerie,
>
> Thank you for getting back to me. Here are the times for
> readGAlignmentPairs, readGAlignmentsList, and scanBam using the code you
> sent.
>
> $readGAlignmentsList
> user system elapsed
> 2529.510 57.487 2589.144
>
> $scanBam
> user system elapsed
> 2465.353 49.404 2516.275
>
> $readGAlignmentPairs
> user system elapsed
> 2560.754 56.612 2619.769
>
> Best wishes
> Phil
>
> On Fri, 2014-05-16 at 12:55 -0700, Valerie Obenchain wrote:
>> Hi Phil,
>>
>> We have several functions that call the same C code in the background.
>> To help isolate the problem can you please run your code with scanBam()
>> and readGAlignmentsList()?
>>
>> bf <- BamFile(fl, asMates=TRUE)
>> readGAlignmentsList(bf, param=param0)
>> scanBam(bf, param=param0)
>>
>> readGAlignmentsList() and readGAlignementPairs() should be very close in
>> time. scanBam() will be faster but not by a huge amount.
>>
>> Thanks.
>> Valerie
>>
>>
>> On 05/13/2014 07:23 AM, Maintainer wrote:
>>> Hi Guys,
>>>
>>> I'm experiencing some performance issues with readGAlignmentPairs from the latest version of Bioconductor (GenomicAlignments_1.0.1, BioC 2.14, R 3.1.0)
>>>
>>> Reading RNASeq paired reads aligned to chr19 (mm9) from a BAM file containing 108,592,829 paired reads takes 3118s. The same code run in R-3.0.2, BioC 2.13, Rsamtools_1.14.3 takes 208s. The results are identical across the two versions.
>>>
>>> Here's the code:
>>>
>>> library(GenomicAlignments)
>>> library(Rsamtools)
>>>
>>> param0 <- ScanBamParam(which=GRanges(seqnames="chr19",
>>> ranges=IRanges(start=1, end=chr19Length))
>>> rd <- readGAlignmentPairs(bamFile, param=param0)
>>>
>>> Any ideas as to why this might be?
>>>
>>> Thanks in advance
>>>
>>> Phil East
>>>
>>>
>>>
>>> -- output of sessionInfo():
>>>
>>> R version 3.1.0 (2014-04-10)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_GB LC_NUMERIC=C LC_TIME=en_GB
>>> [4] LC_COLLATE=en_GB LC_MONETARY=en_GB LC_MESSAGES=en_GB
>>> [7] LC_PAPER=en_GB LC_NAME=C LC_ADDRESS=C
>>> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] grDevices datasets parallel stats graphics utils methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0
>>> [4] Biostrings_2.32.0 XVector_0.4.0 GenomicRanges_1.16.3
>>> [7] GenomeInfoDb_1.0.2 IRanges_1.22.6 Biobase_2.24.0
>>> [10] BiocGenerics_0.10.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.0 bitops_1.0-6
>>> [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4
>>> [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1
>>> [13] Rcpp_0.11.1 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0
>>> [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> ________________________________________________________________________
>>> devteam-bioc mailing list
>>> To unsubscribe from this mailing list send a blank email to
>>> devteam-bioc-leave at lists.fhcrc.org
>>> You can also unsubscribe or change your personal options at
>>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>>>
>>
>>
>
>
>
> NOTICE AND DISCLAIMER
> This e-mail (including any attachments) is intended for the above-named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose.
>
> We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you.
> Cancer Research UK
> Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103)
> A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F).
> Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.
>
> ________________________________________________________________________
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list