[BioC] how to modify a Biostrings object
Patrick Aboyoun
paboyoun at fhcrc.org
Mon Feb 9 04:20:19 CET 2009
Paul,
Herve Pages is the author of this material and probably has a more
clever response than mine, but the replaceLetterAt function (see
help(replaceLetterAt) for details) in the Biostrings package has a
method for DNAString instances that you can leverage.
> library(BSgenome.Hsapiens.UCSC.hg18)
> chr9 <- Hsapiens [['chr9']]
> hg18.9q32 <- subseq (chr9, 113900001, 116700000)
> unmasked(hg18.9q32)
2800000-letter "DNAString" instance
seq:
TGGCTGCACCACAGCCATCCCACTAGGGGGAGCTGT...GTGACTGGAGCGTGGAATGTGGAGTTAGGCAAACTT
> replaceLetterAt(unmasked(hg18.9q32), c(1, 3, 5), "ACG")
2800000-letter "DNAString" instance
seq:
AGCCGGCACCACAGCCATCCCACTAGGGGGAGCTGT...GTGACTGGAGCGTGGAATGTGGAGTTAGGCAAACTT
> new.hg18.9q32 <- replaceLetterAt(unmasked(hg18.9q32), c(1, 3, 5), "ACG")
> masks(new.hg18.9q32) <- masks(hg18.9q32)
> new.hg18.9q32
2800000-letter "MaskedDNAString" instance (# for masking)
seq:
AGCCGGCACCACAGCCATCCCACTAGGGGGAGCTGT...GTGACTGGAGCGTGGAATGTGGAGTTAGGCAAACTT
masks:
maskedwidth maskedratio active names desc
1 0 0.0000000 TRUE AGAPS assembly gaps
2 0 0.0000000 TRUE AMB intra-contig ambiguities (empty)
3 1407815 0.5027911 FALSE RM RepeatMasker
4 21329 0.0076175 FALSE TRF Tandem Repeats Finder [period<=12]
all masks together:
maskedwidth maskedratio
1408618 0.5030779
all active masks together:
maskedwidth maskedratio
0 0
Patrick
Quoting Paul Shannon <pshannon at systemsbiology.org>:
> I wish to introduce some mutations into a segment of Hg18 chromosome 9,
> as part of a simulation I wish to run. I am having difficulty figuring
> out how to modify the BSgenome-related data structures.
>
> I create my (wild-type) variables like this:
>
> chr9 <- Hsapiens [['chr9']]
> hg18.9q32 <<- subseq (chr9, 113900001, 116700000)
>
> How would one create a mutant variant of hg18.9q32? My wish is to
> incrementally -- one gene at a time -- introduce SNPs (and later,
> indels and translocations). Sometimes these changes are intentionally
> random, sometimes I want them at very specific places, to 'spike in' an
> interesting mutation.
>
> I see that one can injectSNPs into a genome:
>
> Hsapiens.snp <- injectSNPs (Hsapiens, "SNPlocs.Hsapiens.dbSNP.20071016")
>
> I could find no documentation for the SNPlocs data structure.
>
> hg18.9q32 appears to be a MaskedDNAString, apparently derived from
> Biostring and XString, all of which seem to be immutable.
>
> Is there any way to do this?
>
> Let me apologize in advance if this is documented and I missed it. I
> have combed the vignettes and the help pages and found nothing.
>
> Many thanks,
>
> - Paul
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list