[Bioc-sig-seq] strandedness bug in Rsamtools?

Vincent Carey stvjc at channing.harvard.edu
Sat Feb 6 19:23:45 CET 2010


using a bam file in leeBamSet/inst/bam (in svn experimentData, not yet released)

> wt5 = scanBam("isowt5_13e.bam")
> p1 = wt5[[1]]$pos[1]
> p1
[1] 799975
> str1 = wt5[[1]]$strand[1]
> str1
[1] +
Levels: + - *
> rd1 = wt5[[1]]$seq[1]
> rd1
  A DNAStringSet instance of length 1
    width seq
[1]    36 GGAGGATAGAGTGGTAACTAAAAAGACTTGGATAAC
> rd1 = wt5[[1]]$seq[[1]]
> rd1
  36-letter "DNAString" instance
seq: GGAGGATAGAGTGGTAACTAAAAAGACTTGGATAAC
> library(BSgenome.Scerevisiae.UCSC.sacCer2)
> sc13 = Scerevisiae$chrXIII
> matchPattern(rd1, sc13)
  Views on a 924429-letter DNAString subject
subject: CCACACACACACCACACCCACACCACACCCACAC...GTGGGTGTGGTGTGGGTGTGGTGTGTGTGTGGGG
views: NONE
> matchPattern(reverseComplement(rd1), sc13)
  Views on a 924429-letter DNAString subject
subject: CCACACACACACCACACCCACACCACACCCACAC...GTGGGTGTGGTGTGGGTGTGGTGTGTGTGTGGGG
views:
     start    end width
[1] 799975 800010    36 [GTTATCCAAGTCTTTTTAGTTACCACTCTATCCTCC]

So although the read is marked + with scanBam and has  flag 16

bash-3.2$ samtools view isowt5*bam | head -2
GA-EAS46_1_209DH:5:14:206:926   16      Scchr13 799975  25      36M
 *       0       0       GTTATCCAAGTCTTTTTAGTTACCACTCTATCCTCC
JONSHWfURPJUgXcadYKhfUhh^hhhhhhhh^hh    NM:i:0  X0:i:1  MD:Z:36

i need to take reverseComplement to match to the reference

the locale is C

> sessionInfo()
R version 2.11.0 Under development (unstable) (2010-01-07 r50940)
i386-apple-darwin9.8.0

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods
[8] base

other attached packages:
[1] BSgenome.Scerevisiae.UCSC.sacCer2_1.3.16
[2] Rsamtools_0.1.29
[3] BSgenome_1.15.4
[4] Biostrings_2.15.18
[5] IRanges_1.5.39
[6] weaver_1.13.0
[7] codetools_0.2-2
[8] digest_0.4.1

loaded via a namespace (and not attached):
[1] Biobase_2.7.0



On Sat, Feb 6, 2010 at 11:40 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 02/06/2010 07:29 AM, Steve Lianoglou wrote:
>> Hi,
>>
>> On Sat, Feb 6, 2010 at 4:17 AM, Steve Lianoglou
>> <mailinglist.honeypot at gmail.com> wrote:
>>
>>> I think there might be a bug in Rsamtools. I have reason to believe
>>> that the strand information that's being returned from
>>> scanBam(...)$strand is flipped, ie. a read that should be on the '+'
>>> strand is being reported as aligning to the '-' strand.
>>
>> I just wanted to include an example of what I'm seeing (sorry I didn't
>> before, the need for shut-eye was competing with the desire to write a
>> better message :-)
>>
>> Anyway, here's what I get from my bamfile:
>>
>> $ samtools view my-sorted.bam chr2:2355-2360
>> 1805160 16      chr2    2355    255     20M     *       0       0
>>  GTCTTACTTTGATTATGATC    IIIIIIIIIIIIIIIIIIII    XA:i:0  MD:Z:20
>> NM:i:0
>> 4681745 16      chr2    2355    255     20M     *       0       0
>>  GTCTTACTTTGATTATGATC    IIIIIIIIIIIIIIIIIIII    XA:i:0  MD:Z:20
>> NM:i:0
>>
>> I believe a value of 16 in the FLAG field indicates the negative strand ...
>>
>> Now in R:
>>
>> R> bf <- 'my-sorted.bam
>> R> r <- scanBam(bf,
>> param=ScanBamParam(which=RangesList(chr2=IRanges(start=2355,
>> end=2360))))[[1]]
>> R> r$pos
>> [1] 2355 2355
>> R> r$strand
>> [1] + +
>> Levels: + - *
>
> I notice that your levels are different from mine; what does
> levels(strand()) say? what's your locale? (sesssionInfo() ;) --
> Rsamtools and strand() in general was trying to be locale-independent,
> but this seems not to be the case...
>
> The workaround is likely to work in a "C" locale, ?Sys.setlocale
>
> Martin
>
>> If I'm still thinking about this straight, and it's a bug, maybe for
>> now I'll just swap the levels of the r$strand to be c('-', '+', '*')
>> when I get my reads back from scanBam.
>>
>> Thanks,
>> -steve
>>
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list