[BioC] deqseq_count and BWA-based SAM files
Simon Anders
anders at embl.de
Tue Dec 13 23:12:14 CET 2011
Hi Wyatt
On 2011-12-13 22:06, Wyatt McMahon wrote:
> Unfortunately, none of these has worked. I've used both Shan's
> script as well as samtools and am still having the same problem.
> Despite everything being very nicely sorted, I still getting the same
> error message.
Do you get the error for every read or only for some. The latter is
typically harmless.
To explain: The way how the SAM format stores paired-end reads is, IMO,
rather unfortunate. Each mate gets its own SAM line, and the two SAM
lines can be at rather different places in the file. Once you sort by
name, the mates will be close to each other (even though they may still
be mixed up in case there is more than one alignment for the pair).
HTSeq takes a chunk of adjacent lines with the same read ID and arranges
them into matching pairs (by using the MRNM and MPOS (or RNEXT and PNEXT
in the new terminology) columns). If this does not work, the warning is
displayed.
Often, if you do some filtering, you might remove a SAM line for a read
but leave in the line for its mate. HTSeq will simply skip such reads
but display the warning you saw. You can silence the warnings (but also
all others) with the '-q' option if they bother you.
Simon
More information about the Bioconductor
mailing list