[Bioc-sig-seq] Extract masked sequences

Patrick Aboyoun paboyoun at fhcrc.org
Wed Jan 20 01:38:52 CET 2010


Arnaud,
The BSgenome object, in this case Hsapiens, contains references to on 
disk storage of information such as masks. Since this information is not 
in memory and the data stored on disk is considered read-only, you 
cannot change the mask information on a BSgenome object. Instead, you 
need to modify the masks chromosome by chromosome after they have been 
loaded into memory as you showed in your code below.

What is your use case that motivated your e-mail?

If you never want to deal with masks, you can always use the unmasked 
function to strip the masks when you load the chromosome:

 > unmasked(Hsapiens$chr1)
  247249719-letter "DNAString" instance
seq: 
TAACCCTAACCCTAACCCTAACCCTAACCCTAACCC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN



Patrick



Droit Arnaud wrote:
> Hi,
>
> I wondering if anybody can help me to generate masked (by RepeatMasker for instance) sequences.
>
> I'm currently using Bsgenome to extract sequence from a BED file such as :
>
> library(BSgenome.Hsapiens.UCSC.hg18)
> genome<-Hsapiens
> FastaSeq<-getSeq(genome,"chr1",start=1000,end=1200, as.character=FALSE)
>
> I know that Bsgenome contains masks that can be apply by using :
>
> chr1 <- genome$chr1
> active(masks(chr1)) <- TRUE
>
> So, I'm trying to use it to change the masks of the genome object. But I cannot modify it :
>
> active(masks(genome$chr1)) <- TRUE
>  Error in `$<-`(`*tmp*`, "chr1", value = <S4 object of class "MaskedDNAString">) :
>  no method for assigning subsets of this S4 class
>
> Is there a way get the masked sequence with the getSeq function ?
>
> Thanks.
>
> Arnaud.
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list