[BioC] Rsamtools: Read a bam line by line
Alex Gutteridge
alexg at ruggedtextile.com
Sat Jul 7 10:10:58 CEST 2012
Hi,
I have an application where I have mapped reads to two different
reference sequences and I have these two mappings as bam files. I now
want to see for each read whether it mapped to one, both or neither of
the two references (and where the mapping is to both see if one was
better than the other). All my reading of the samtools docs suggested
that there is no facility for indexing by read name as opposed to
genomic coordinates, so the first way that occured to me to do this was
to sort each bam by read name (which is possible using samtools) and
then step through the two bams in parallel checking the alignment
information for each read. There are ~120M reads, so reading the two
bams into memory all in one go is not practical, but from what I can see
Rsamtools doesn't have functions for reading bams either line by line or
by chunks of lines. All the ScanBamParam arguments seem to refer to
reading chunks by genomic ranges - this isn't appropriate for my
application as the genomic ranges between the two references are not
necessarily comparable.
So my question is really two-fold:
1) Am I right in my reading of the Rsamtools docs? Are there any
lower level functions for reading bams I am ignoring that could do what
I want?
2) My thought now is to just make sams from the sorted bams and step
through using readLines() and my own sam parsing code. Is there an
alternative strategy I have overlooked?
--
Alex Gutteridge
More information about the Bioconductor
mailing list