[Bioc-devel] Streaming through BAM and tabix files
Thomas Girke
thomas.girke at ucr.edu
Mon Apr 16 02:46:25 CEST 2012
Great. I will test this out as soon as I can.
Thanks,
Thomas
On Sun, Apr 15, 2012 at 06:13:55PM +0000, Martin Morgan wrote:
> Thomas asked here
>
> https://stat.ethz.ch/pipermail/bioconductor/2012-April/044928.html
>
> about iterating ('streaming') through a bam file. I added support for
> this (and for tabix files) to Rsamtools 1.9.4. The idea is that one
> creates a BamFile (similarly with TabixFile; the field has actually been
> added to the RsamtoolsFile reference class) with argument yieldSize, and
> then scanBam / scanTabix respects that. From ?BamFile
>
> ## chunks of size 1000
> bf <- open(BamFile(fl, yieldSize=1000))
> while (nrec <- length(scanBam(bf)[[1]][[1]]))
> cat("records:", nrec, "\n")
> close(bf)
>
> A consequence is that functions using scanBam internally automatically
> gain yield-like behavior; thus
>
> bf <- open(BamFile(fl, yieldSize=1000))
> while (length(ga <- readBamGappedAlignments(bf)))
> cat("records:", length(ga), "\n")
> close(bf)
>
> similarly for VariantAnnotation::readVcf (after updating both Rsamtools
> and VariantAnnotation).
>
> Streaming is not yet supported for readBamGappedAlignmentPairs, or when
> the param argument to scanBam / scanTabix contains ranges (it seems like
> iteration when ranges are present should be range-based, rather than
> record-based).
>
> Martin
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
More information about the Bioc-devel
mailing list