[Bioc-devel] Streaming through BAM and tabix files

Thomas Girke thomas.girke at ucr.edu
Mon Apr 16 02:46:25 CEST 2012


Great. I will test this out as soon as I can.

Thanks,

Thomas


On Sun, Apr 15, 2012 at 06:13:55PM +0000, Martin Morgan wrote:
> Thomas asked here
> 
> https://stat.ethz.ch/pipermail/bioconductor/2012-April/044928.html
> 
> about iterating ('streaming') through a bam file. I added support for 
> this (and for tabix files) to Rsamtools 1.9.4. The idea is that one 
> creates a BamFile (similarly with TabixFile; the field has actually been 
> added to the RsamtoolsFile reference class) with argument yieldSize, and 
> then scanBam / scanTabix respects that. From ?BamFile
> 
>       ## chunks of size 1000
>       bf <- open(BamFile(fl, yieldSize=1000))
>       while (nrec <- length(scanBam(bf)[[1]][[1]]))
>           cat("records:", nrec, "\n")
>       close(bf)
> 
> A consequence is that functions using scanBam internally automatically 
> gain yield-like behavior; thus
> 
>       bf <- open(BamFile(fl, yieldSize=1000))
>       while (length(ga <- readBamGappedAlignments(bf)))
>           cat("records:", length(ga), "\n")
>       close(bf)
> 
> similarly for VariantAnnotation::readVcf (after updating both Rsamtools 
> and VariantAnnotation).
> 
> Streaming is not yet supported for readBamGappedAlignmentPairs, or when 
> the param argument to scanBam / scanTabix contains ranges (it seems like 
> iteration when ranges are present should be range-based, rather than 
> record-based).
> 
> Martin
> -- 
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> 
> Location: M1-B861
> Telephone: 206 667-2793



More information about the Bioc-devel mailing list