[Bioc-sig-seq] strandedness bug in Rsamtools?

Martin Morgan mtmorgan at fhcrc.org
Sat Feb 6 17:40:53 CET 2010


On 02/06/2010 07:29 AM, Steve Lianoglou wrote:
> Hi,
> 
> On Sat, Feb 6, 2010 at 4:17 AM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
> 
>> I think there might be a bug in Rsamtools. I have reason to believe
>> that the strand information that's being returned from
>> scanBam(...)$strand is flipped, ie. a read that should be on the '+'
>> strand is being reported as aligning to the '-' strand.
> 
> I just wanted to include an example of what I'm seeing (sorry I didn't
> before, the need for shut-eye was competing with the desire to write a
> better message :-)
> 
> Anyway, here's what I get from my bamfile:
> 
> $ samtools view my-sorted.bam chr2:2355-2360
> 1805160 16      chr2    2355    255     20M     *       0       0
>  GTCTTACTTTGATTATGATC    IIIIIIIIIIIIIIIIIIII    XA:i:0  MD:Z:20
> NM:i:0
> 4681745 16      chr2    2355    255     20M     *       0       0
>  GTCTTACTTTGATTATGATC    IIIIIIIIIIIIIIIIIIII    XA:i:0  MD:Z:20
> NM:i:0
> 
> I believe a value of 16 in the FLAG field indicates the negative strand ...
> 
> Now in R:
> 
> R> bf <- 'my-sorted.bam
> R> r <- scanBam(bf,
> param=ScanBamParam(which=RangesList(chr2=IRanges(start=2355,
> end=2360))))[[1]]
> R> r$pos
> [1] 2355 2355
> R> r$strand
> [1] + +
> Levels: + - *

I notice that your levels are different from mine; what does
levels(strand()) say? what's your locale? (sesssionInfo() ;) --
Rsamtools and strand() in general was trying to be locale-independent,
but this seems not to be the case...

The workaround is likely to work in a "C" locale, ?Sys.setlocale

Martin

> If I'm still thinking about this straight, and it's a bug, maybe for
> now I'll just swap the levels of the r$strand to be c('-', '+', '*')
> when I get my reads back from scanBam.
> 
> Thanks,
> -steve
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list