[Bioc-devel] Reading FASTQ/BAM from open file handle?

Martin Morgan mtmorgan at fhcrc.org
Wed Dec 5 01:37:50 CET 2012


On 12/04/2012 03:17 PM, Ryan C. Thompson wrote:
> Perfect, that's just what I wanted for Fastq files. Is there no R facility for
> reading unindexed bam?

Well, playing around a little I did this in one terminal

~$ mkfifo myfifo
~$ cat tmp/bam/ex1.bam > myfifo

And then in R

   library(Rsamtools)
   scanBam("~/myfifo")  # or, e.g., readBamGappedAlignments("~/myfifo")

so I guess there are hacks. The way to iterate through a bam file is with BamFile

   bf = BamFile("myfifo", yieldSize=100000L)

but it's requiring an index when opened

 > open(bf)
Error in open.BamFile(bf) : failed to load BAM index
   file: myfifo
In addition: Warning message:
In open.BamFile(bf) : [bam_index_load] fail to load BAM index.

so you'd be stuck parsing the whole file...

Martin

>
> On Tue 04 Dec 2012 02:47:56 PM PST, Martin Morgan wrote:
>> On 12/04/2012 01:27 PM, Ryan C. Thompson wrote:
>>> Hi all,
>>>
>>> I'm currently experimenting with using quip
>>> (https://github.com/dcjones/quip#readme) to save disk space when
>>> storing FASTQ
>>> and BAM files. One thing that would be nice is to read
>>> quip-compressed FASTQ or
>>> BAM files directly into R. Obviously direct support for reading quip
>>> compression
>>> would be ideal, but in the short term, quip supports decompression to
>>> standard
>>> output, so if I could have R read FASTQ or BAM data from an open file
>>> handle, I
>>> could pipe the decompressed output to R's FASTQ or BAM reader
>>> functions. Does
>>> anyone know if this is possible?
>>
>> ShortRead::FastqStreamer works with R connections, so for instance
>> after example(FastqStreamer)
>>
>>   cmd = sprintf("cat %s", fl)
>>   p = pipe(cmd)
>>   strm = FastqStreamer(p, 50)
>>   yield(strm)
>>   yield(strm)
>>
>> Rsamtools::scanBam is really expecting to read from an (indexed) bam
>> file with random access.
>>
>> Martin
>>
>>>
>>> -Ryan Thompson
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list