[Bioc-devel] Streaming through BAM and tabix files
Martin Morgan
mtmorgan at fhcrc.org
Sun Apr 15 20:13:55 CEST 2012
Thomas asked here
https://stat.ethz.ch/pipermail/bioconductor/2012-April/044928.html
about iterating ('streaming') through a bam file. I added support for
this (and for tabix files) to Rsamtools 1.9.4. The idea is that one
creates a BamFile (similarly with TabixFile; the field has actually been
added to the RsamtoolsFile reference class) with argument yieldSize, and
then scanBam / scanTabix respects that. From ?BamFile
## chunks of size 1000
bf <- open(BamFile(fl, yieldSize=1000))
while (nrec <- length(scanBam(bf)[[1]][[1]]))
cat("records:", nrec, "\n")
close(bf)
A consequence is that functions using scanBam internally automatically
gain yield-like behavior; thus
bf <- open(BamFile(fl, yieldSize=1000))
while (length(ga <- readBamGappedAlignments(bf)))
cat("records:", length(ga), "\n")
close(bf)
similarly for VariantAnnotation::readVcf (after updating both Rsamtools
and VariantAnnotation).
Streaming is not yet supported for readBamGappedAlignmentPairs, or when
the param argument to scanBam / scanTabix contains ranges (it seems like
iteration when ranges are present should be range-based, rather than
record-based).
Martin
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-devel
mailing list