[Bioc-devel] Views of a MaskedDNAstring in Biostrings

Sean Davis sdavis2 at mail.nih.gov
Wed Jun 4 22:17:22 CEST 2008


On Wed, Jun 4, 2008 at 3:17 PM, Herve Pages <hpages at fhcrc.org> wrote:
> Hi Sean,
>
> Sean Davis wrote:
>>
>> Herve,
>>
>> I have been playing with the new version of Biostrings--very nice,
>> indeed!  Is it possible to generate a view on a MaskedDNAString object
>> using the newest version of Biostrings?  I tried using views(.....) on
>> human chr1 and got back a DNAstring object rather than a
>> MaskedDNAString object.
>
> No, strictly speaking, you cannot create views on a MaskedDNAString
> object. You can still call views() on a MaskedDNAString object but
> the masks will be dropped so you will get a set of views on the
> original (unmasked) sequence in return.
> The XStringViews container has been around for a while and its
> 'subject' slot has been of type XString from the beginning (there
> has been some renaming in the meantime e.g. BString -> XString and
> BStringViews -> XStringViews, but these containers remain basically
> the same).
> We could change the definition of the XStringViews class to allow
> the 'subject' slot to be a MaskedXString object (in addition to being
> an XString object) or we could introduce a new class (MaskedXStringViews)
> that would extend the XStringViews class. Maybe I like the latter better,
> not sure yet, I would need to think a little bit more about it.
> But having a container for storing views on a MaskedXString object
> will add yet another level of complexity and there are lots of details
> that will need to be examined to make this new container fit nicely
> into the global picture.
>
> In the meantime, you can work around this by using "hard masking".
> If you inject a hard mask in your sequence:
>
>  > mask0 <- Mask(mask.width=29, start=c(3, 10, 25), width=c(6, 8, 5))
>  > x <- DNAString("ACACAACTAGATAGNACTNNGAGAGACGC")
>  > masks(x) <- mask0
>  > x
>    29-letter "MaskedDNAString" instance (# for masking)
>  seq: AC######A########TNNGAGA#####
>  masks:
>    maskedwidth maskedratio active
>  1          19   0.6551724   TRUE
>  > y <- injectHardMask(x)
>  > y
>    29-letter "DNAString" instance
>  seq: AC++++++A++++++++TNNGAGA+++++
>
> Then, creating views on it keeps the hard mask information:
>
>  > views(y, 1:7, 23:29)
>    Views on a 29-letter DNAString subject
>  subject: AC++++++A++++++++TNNGAGA+++++
>  views:
>      start end width
>  [1]     1  23    23 [AC++++++A++++++++TNNGAG]
>  [2]     2  24    23 [C++++++A++++++++TNNGAGA]
>  [3]     3  25    23 [++++++A++++++++TNNGAGA+]
>  [4]     4  26    23 [+++++A++++++++TNNGAGA++]
>  [5]     5  27    23 [++++A++++++++TNNGAGA+++]
>  [6]     6  28    23 [+++A++++++++TNNGAGA++++]
>  [7]     7  29    23 [++A++++++++TNNGAGA+++++]
>
> This is because, with a hard mask, the masking letter ('+', belongs
> to the DNAString alphabet) is really *in* the sequence.
>
> Hope this helps,

This solution will do just fine for what I need.  The other route I
thought of is to use coverage() to get an integer representation of
the mask and use that for determining masked bases relative to a view.

Sean

>>
>> Thanks,
>> Sean
>>
>>
>>> sessionInfo()
>>
>> R version 2.8.0 Under development (unstable) (2008-05-12 r45677)
>> i386-apple-darwin8.10.1
>>
>> locale:
>> C
>>
>> attached base packages:
>> [1] tools     stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] BSgenome.Hsapiens.UCSC.hg18_1.3.7 BSgenome_1.9.3
>> [3] Biostrings_2.9.17                 Biobase_2.1.3
>
>



More information about the Bioc-devel mailing list