[Bioc-devel] Making GenomicAlignments::readGAlignmentPairs() fail fast if given bad seqlevels in `which`
Peter Hickey
peter.hickey at gmail.com
Wed Mar 16 19:53:44 CET 2016
Hi,
GenomicAlignments::readGAlignmentPairs() can take a while to
(correctly) fail if the `which` parameter contains a "bad" seqlevel.
It'd be great if it failed early in the following scenario (just
experienced).
An example BAM is available from
https://www.dropbox.com/sh/4avqxuqnhlv3r9c/AADqx-XqXV6c7Dc_SaSUq324a?dl=0
(110 MB; sorry, it needs to be large-ish in order to notice the
problem). The following code ought to reproduce the problem. Here I am
taking the example BAM of mouse data mapped to mm10 and using a
`which` based on hg19 (it was mistakenly assuming all my data were
human that led me to this problem). When a single "bad" seqlevel is
provided via `which` then it errors fast and with a helpful error.
However, if the `which` contains multiple seqlevels, some "good" and
some "bad", then it seemingly just hangs. I initially thought it had
just frozen indefinitely but it actually eventually returns the
correct error.
It'd be great if it failed fast in this situation.
Thanks,
Pete
library(GenomicAlignments)
library(BSgenome.Hsapiens.UCSC.hg19)
si <- seqinfo(BSgenome.Hsapiens.UCSC.hg19)
# mouse data mapped to mm10 (temporarily available from
https://www.dropbox.com/sh/4avqxuqnhlv3r9c/AADqx-XqXV6c7Dc_SaSUq324a?dl=0)
file <- "~/Desktop/tmp/SRR1781315.markdup.bam"
# Errors fast and helpfully because chr20 doesn't exist in mouse
readGAlignmentPairs(file, param = ScanBamParam(which = as(si, "GRanges")[20]))
# Takes a long time to error if some seqlevels exist (chr19) and some
don't exist (chr20) in sample
readGAlignmentPairs(file, param = ScanBamParam(which = as(si,
"GRanges")[19:20]))
> sessionInfo()
R Under development (unstable) (2016-03-11 r70310)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils
datasets methods base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.39.4
rtracklayer_1.31.7
[4] GenomicAlignments_1.7.20 Rsamtools_1.23.3
Biostrings_2.39.12
[7] XVector_0.11.7 SummarizedExperiment_1.1.22
Biobase_2.31.3
[10] GenomicRanges_1.23.24 GenomeInfoDb_1.7.6
IRanges_2.5.40
[13] S4Vectors_0.9.42 BiocGenerics_0.17.3
repete_0.0.0.9000
[16] devtools_1.10.0
loaded via a namespace (and not attached):
[1] XML_3.98-1.4 digest_0.6.9 bitops_1.0-6
zlibbioc_1.17.0 BiocParallel_1.5.20
[6] tools_3.3.0 RCurl_1.95-4.8 memoise_1.0.0
More information about the Bioc-devel
mailing list