[Bioc-devel] Streaming through BAM and tabix files

Martin Morgan mtmorgan at fhcrc.org
Sun Apr 15 20:13:55 CEST 2012


Thomas asked here

https://stat.ethz.ch/pipermail/bioconductor/2012-April/044928.html

about iterating ('streaming') through a bam file. I added support for 
this (and for tabix files) to Rsamtools 1.9.4. The idea is that one 
creates a BamFile (similarly with TabixFile; the field has actually been 
added to the RsamtoolsFile reference class) with argument yieldSize, and 
then scanBam / scanTabix respects that. From ?BamFile

      ## chunks of size 1000
      bf <- open(BamFile(fl, yieldSize=1000))
      while (nrec <- length(scanBam(bf)[[1]][[1]]))
          cat("records:", nrec, "\n")
      close(bf)

A consequence is that functions using scanBam internally automatically 
gain yield-like behavior; thus

      bf <- open(BamFile(fl, yieldSize=1000))
      while (length(ga <- readBamGappedAlignments(bf)))
          cat("records:", length(ga), "\n")
      close(bf)

similarly for VariantAnnotation::readVcf (after updating both Rsamtools 
and VariantAnnotation).

Streaming is not yet supported for readBamGappedAlignmentPairs, or when 
the param argument to scanBam / scanTabix contains ranges (it seems like 
iteration when ranges are present should be range-based, rather than 
record-based).

Martin
-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-devel mailing list