[Bioc-devel] Views of a MaskedDNAstring in Biostrings

Herve Pages hpages at fhcrc.org
Wed Jun 4 21:17:33 CEST 2008


Hi Sean,

Sean Davis wrote:
> Herve,
> 
> I have been playing with the new version of Biostrings--very nice,
> indeed!  Is it possible to generate a view on a MaskedDNAString object
> using the newest version of Biostrings?  I tried using views(.....) on
> human chr1 and got back a DNAstring object rather than a
> MaskedDNAString object.

No, strictly speaking, you cannot create views on a MaskedDNAString
object. You can still call views() on a MaskedDNAString object but
the masks will be dropped so you will get a set of views on the
original (unmasked) sequence in return.
The XStringViews container has been around for a while and its
'subject' slot has been of type XString from the beginning (there
has been some renaming in the meantime e.g. BString -> XString and
BStringViews -> XStringViews, but these containers remain basically
the same).
We could change the definition of the XStringViews class to allow
the 'subject' slot to be a MaskedXString object (in addition to being
an XString object) or we could introduce a new class (MaskedXStringViews)
that would extend the XStringViews class. Maybe I like the latter better,
not sure yet, I would need to think a little bit more about it.
But having a container for storing views on a MaskedXString object
will add yet another level of complexity and there are lots of details
that will need to be examined to make this new container fit nicely
into the global picture.

In the meantime, you can work around this by using "hard masking".
If you inject a hard mask in your sequence:

   > mask0 <- Mask(mask.width=29, start=c(3, 10, 25), width=c(6, 8, 5))
   > x <- DNAString("ACACAACTAGATAGNACTNNGAGAGACGC")
   > masks(x) <- mask0
   > x
     29-letter "MaskedDNAString" instance (# for masking)
   seq: AC######A########TNNGAGA#####
   masks:
     maskedwidth maskedratio active
   1          19   0.6551724   TRUE
   > y <- injectHardMask(x)
   > y
     29-letter "DNAString" instance
   seq: AC++++++A++++++++TNNGAGA+++++

Then, creating views on it keeps the hard mask information:

   > views(y, 1:7, 23:29)
     Views on a 29-letter DNAString subject
   subject: AC++++++A++++++++TNNGAGA+++++
   views:
       start end width
   [1]     1  23    23 [AC++++++A++++++++TNNGAG]
   [2]     2  24    23 [C++++++A++++++++TNNGAGA]
   [3]     3  25    23 [++++++A++++++++TNNGAGA+]
   [4]     4  26    23 [+++++A++++++++TNNGAGA++]
   [5]     5  27    23 [++++A++++++++TNNGAGA+++]
   [6]     6  28    23 [+++A++++++++TNNGAGA++++]
   [7]     7  29    23 [++A++++++++TNNGAGA+++++]

This is because, with a hard mask, the masking letter ('+', belongs
to the DNAString alphabet) is really *in* the sequence.

Hope this helps,
H.

> 
> Thanks,
> Sean
> 
> 
>> sessionInfo()
> R version 2.8.0 Under development (unstable) (2008-05-12 r45677)
> i386-apple-darwin8.10.1
> 
> locale:
> C
> 
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] BSgenome.Hsapiens.UCSC.hg18_1.3.7 BSgenome_1.9.3
> [3] Biostrings_2.9.17                 Biobase_2.1.3



More information about the Bioc-devel mailing list