[Bioc-devel] Views of a MaskedDNAstring in Biostrings
Herve Pages
hpages at fhcrc.org
Wed Jun 4 21:17:33 CEST 2008
Hi Sean,
Sean Davis wrote:
> Herve,
>
> I have been playing with the new version of Biostrings--very nice,
> indeed! Is it possible to generate a view on a MaskedDNAString object
> using the newest version of Biostrings? I tried using views(.....) on
> human chr1 and got back a DNAstring object rather than a
> MaskedDNAString object.
No, strictly speaking, you cannot create views on a MaskedDNAString
object. You can still call views() on a MaskedDNAString object but
the masks will be dropped so you will get a set of views on the
original (unmasked) sequence in return.
The XStringViews container has been around for a while and its
'subject' slot has been of type XString from the beginning (there
has been some renaming in the meantime e.g. BString -> XString and
BStringViews -> XStringViews, but these containers remain basically
the same).
We could change the definition of the XStringViews class to allow
the 'subject' slot to be a MaskedXString object (in addition to being
an XString object) or we could introduce a new class (MaskedXStringViews)
that would extend the XStringViews class. Maybe I like the latter better,
not sure yet, I would need to think a little bit more about it.
But having a container for storing views on a MaskedXString object
will add yet another level of complexity and there are lots of details
that will need to be examined to make this new container fit nicely
into the global picture.
In the meantime, you can work around this by using "hard masking".
If you inject a hard mask in your sequence:
> mask0 <- Mask(mask.width=29, start=c(3, 10, 25), width=c(6, 8, 5))
> x <- DNAString("ACACAACTAGATAGNACTNNGAGAGACGC")
> masks(x) <- mask0
> x
29-letter "MaskedDNAString" instance (# for masking)
seq: AC######A########TNNGAGA#####
masks:
maskedwidth maskedratio active
1 19 0.6551724 TRUE
> y <- injectHardMask(x)
> y
29-letter "DNAString" instance
seq: AC++++++A++++++++TNNGAGA+++++
Then, creating views on it keeps the hard mask information:
> views(y, 1:7, 23:29)
Views on a 29-letter DNAString subject
subject: AC++++++A++++++++TNNGAGA+++++
views:
start end width
[1] 1 23 23 [AC++++++A++++++++TNNGAG]
[2] 2 24 23 [C++++++A++++++++TNNGAGA]
[3] 3 25 23 [++++++A++++++++TNNGAGA+]
[4] 4 26 23 [+++++A++++++++TNNGAGA++]
[5] 5 27 23 [++++A++++++++TNNGAGA+++]
[6] 6 28 23 [+++A++++++++TNNGAGA++++]
[7] 7 29 23 [++A++++++++TNNGAGA+++++]
This is because, with a hard mask, the masking letter ('+', belongs
to the DNAString alphabet) is really *in* the sequence.
Hope this helps,
H.
>
> Thanks,
> Sean
>
>
>> sessionInfo()
> R version 2.8.0 Under development (unstable) (2008-05-12 r45677)
> i386-apple-darwin8.10.1
>
> locale:
> C
>
> attached base packages:
> [1] tools stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] BSgenome.Hsapiens.UCSC.hg18_1.3.7 BSgenome_1.9.3
> [3] Biostrings_2.9.17 Biobase_2.1.3
More information about the Bioc-devel
mailing list