[Bioc-devel] Views of a MaskedDNAstring in Biostrings
Sean Davis
sdavis2 at mail.nih.gov
Wed Jun 4 22:17:22 CEST 2008
On Wed, Jun 4, 2008 at 3:17 PM, Herve Pages <hpages at fhcrc.org> wrote:
> Hi Sean,
>
> Sean Davis wrote:
>>
>> Herve,
>>
>> I have been playing with the new version of Biostrings--very nice,
>> indeed! Is it possible to generate a view on a MaskedDNAString object
>> using the newest version of Biostrings? I tried using views(.....) on
>> human chr1 and got back a DNAstring object rather than a
>> MaskedDNAString object.
>
> No, strictly speaking, you cannot create views on a MaskedDNAString
> object. You can still call views() on a MaskedDNAString object but
> the masks will be dropped so you will get a set of views on the
> original (unmasked) sequence in return.
> The XStringViews container has been around for a while and its
> 'subject' slot has been of type XString from the beginning (there
> has been some renaming in the meantime e.g. BString -> XString and
> BStringViews -> XStringViews, but these containers remain basically
> the same).
> We could change the definition of the XStringViews class to allow
> the 'subject' slot to be a MaskedXString object (in addition to being
> an XString object) or we could introduce a new class (MaskedXStringViews)
> that would extend the XStringViews class. Maybe I like the latter better,
> not sure yet, I would need to think a little bit more about it.
> But having a container for storing views on a MaskedXString object
> will add yet another level of complexity and there are lots of details
> that will need to be examined to make this new container fit nicely
> into the global picture.
>
> In the meantime, you can work around this by using "hard masking".
> If you inject a hard mask in your sequence:
>
> > mask0 <- Mask(mask.width=29, start=c(3, 10, 25), width=c(6, 8, 5))
> > x <- DNAString("ACACAACTAGATAGNACTNNGAGAGACGC")
> > masks(x) <- mask0
> > x
> 29-letter "MaskedDNAString" instance (# for masking)
> seq: AC######A########TNNGAGA#####
> masks:
> maskedwidth maskedratio active
> 1 19 0.6551724 TRUE
> > y <- injectHardMask(x)
> > y
> 29-letter "DNAString" instance
> seq: AC++++++A++++++++TNNGAGA+++++
>
> Then, creating views on it keeps the hard mask information:
>
> > views(y, 1:7, 23:29)
> Views on a 29-letter DNAString subject
> subject: AC++++++A++++++++TNNGAGA+++++
> views:
> start end width
> [1] 1 23 23 [AC++++++A++++++++TNNGAG]
> [2] 2 24 23 [C++++++A++++++++TNNGAGA]
> [3] 3 25 23 [++++++A++++++++TNNGAGA+]
> [4] 4 26 23 [+++++A++++++++TNNGAGA++]
> [5] 5 27 23 [++++A++++++++TNNGAGA+++]
> [6] 6 28 23 [+++A++++++++TNNGAGA++++]
> [7] 7 29 23 [++A++++++++TNNGAGA+++++]
>
> This is because, with a hard mask, the masking letter ('+', belongs
> to the DNAString alphabet) is really *in* the sequence.
>
> Hope this helps,
This solution will do just fine for what I need. The other route I
thought of is to use coverage() to get an integer representation of
the mask and use that for determining masked bases relative to a view.
Sean
>>
>> Thanks,
>> Sean
>>
>>
>>> sessionInfo()
>>
>> R version 2.8.0 Under development (unstable) (2008-05-12 r45677)
>> i386-apple-darwin8.10.1
>>
>> locale:
>> C
>>
>> attached base packages:
>> [1] tools stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] BSgenome.Hsapiens.UCSC.hg18_1.3.7 BSgenome_1.9.3
>> [3] Biostrings_2.9.17 Biobase_2.1.3
>
>
More information about the Bioc-devel
mailing list