[Bioc-sig-seq] strandedness bug in Rsamtools?

Martin Morgan mtmorgan at fhcrc.org
Sat Feb 6 17:36:13 CET 2010


Hi Steve --

On 02/06/2010 07:29 AM, Steve Lianoglou wrote:
> Hi,
> 
> On Sat, Feb 6, 2010 at 4:17 AM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
> 
>> I think there might be a bug in Rsamtools. I have reason to believe
>> that the strand information that's being returned from
>> scanBam(...)$strand is flipped, ie. a read that should be on the '+'
>> strand is being reported as aligning to the '-' strand.
> 
> I just wanted to include an example of what I'm seeing (sorry I didn't
> before, the need for shut-eye was competing with the desire to write a
> better message :-)
> 
> Anyway, here's what I get from my bamfile:
> 
> $ samtools view my-sorted.bam chr2:2355-2360
> 1805160 16      chr2    2355    255     20M     *       0       0
>  GTCTTACTTTGATTATGATC    IIIIIIIIIIIIIIIIIIII    XA:i:0  MD:Z:20
> NM:i:0
> 4681745 16      chr2    2355    255     20M     *       0       0
>  GTCTTACTTTGATTATGATC    IIIIIIIIIIIIIIIIIIII    XA:i:0  MD:Z:20
> NM:i:0
> 
> I believe a value of 16 in the FLAG field indicates the negative strand ...

It's not a simple sign flip

fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
bam <- scanBam(fl)[[1]]
## samtools view ex1.bam > ex1.sam
tbl <- read.table("~/tmp/ex1.sam", fill=TRUE, sep="\t", quote="")


 all.equal(as.character(tbl[[1]]), bam$qname)
[1] TRUE
> bitAnd(head(tbl[[2]], 20), 0x0010)
 [1]  0  0  0  0  0  0  0  0 16  0  0  0  0  0  0  0  0  0  0  0
> head(bam$strand, 20)
 [1] + + + + + + + + - + + + + + + + + + + +
Levels: - + *
> table(bam$strand, useNA="always")

   -    +    * <NA>
1624 1647    0   36
> table(bitAnd(tbl[[2]][!is.na(bam$stran)], 16)==0, useNA="always")

FALSE  TRUE  <NA>
 1624  1647     0

though these tests don't cover the flag=16 case. Will investigate further.

Martin

> 
> Now in R:
> 
> R> bf <- 'my-sorted.bam
> R> r <- scanBam(bf,
> param=ScanBamParam(which=RangesList(chr2=IRanges(start=2355,
> end=2360))))[[1]]
> R> r$pos
> [1] 2355 2355
> R> r$strand
> [1] + +
> Levels: + - *
> 
> If I'm still thinking about this straight, and it's a bug, maybe for
> now I'll just swap the levels of the r$strand to be c('-', '+', '*')
> when I get my reads back from scanBam.
> 
> Thanks,
> -steve
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list