[BioC] Rsamtools hangs reading SOLiD bam files
Martin Morgan
mtmorgan at fhcrc.org
Tue Oct 12 18:57:33 CEST 2010
On 10/12/2010 12:27 AM, Asta Laiho wrote:
> Hi,
>
> I'm trying to work with *.bam and *.bai files produced using Bioscope (SOLiD related software package, v.1.2.1). I tried two examples in the Rsamtools manual (the one on top of the page 2 for querying the reads in the given range, and the one on the bottom of the page 4 for calculating coverages for chunks of the file). I tried with files of different sizes (35Mb, 1.8Gb) but the code in both examples just kept running without any error messages and without producing results in any reasonable time. I even left it running over night but it still hadn't finished. My computer has Mac OS X 10.6.4 with 8Gb memory. The session info is attached below. Are there any known issues with Rsamtools and bam/bai files originating from SOLiD Bioscope software?
>
> Many thanks for all advice in advance,
Hi Asta --
I don't know of outstanding issues. If the query is expected to retrieve
a 'small' number of reads (millions, say) then it should be fast (as in
not enough time to check your email). If it's returning large numbers of
reads then memory might become a problem.
If there is a 'bug' my guess would be that it involved integer overflow
in the index -- seeking a read that is late in a very large BAM file.
So...
verify basic functionality with
library(Rsamtools); example(scanBam)
try accessing a few reads at the beginning of the first reference
sequence returned by
scanBamHeader(fl)[[1]][["targets"]]
where 'fl' is the name of your BAM file.
If this doesn't provide any hint then please include a minimal script
sufficient to reproduce your problem. It would be very helpful to point
to a publicly available BAM file generated by the same tools as you are
using.
Martin
> Asta
>
> sessionInfo()
> R version 2.11.1 (2010-05-31)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] C/UTF-8/C/C/C/C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Rsamtools_1.0.8 Biostrings_2.16.9 GenomicRanges_1.0.7
> [4] IRanges_1.6.11
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.8.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list