[Bioc-sig-seq] making target from fasta file

Herve Pages hpages at fhcrc.org
Wed Jun 4 22:18:54 CEST 2008

Hi Joseph,

Joseph Dhahbi, P.h.D. wrote:
> Hi Herve
> Thank you very much for your help. Using the built-in masks as you 
> suggested was easy.
> Do I need to do it for each chromosome separately? Is there a way to 
> apply it to the whole genome and create MaskedDNAString of the whole 
> genome?

No way to create a MaskedDNAString object of the whole genome. Note that
this would be a very big object and that most machines would not have
enough memory for this. Of course, with a medium-size genome like the Fly,
the problem is not as severe as with the Human genome but still...

How about using the trick I've sent you in a previous email (see the email
for the details):

   > allrepeats <- read.XStringViews("dm3rm", format="fasta", subjectClass="DNAString", collapse="-")
   > c <- countPDict(pdict, subject(allrepeats))

Also, in my previous email, I was trying to reproduce the problem you had
with read.DNAStringSet() but couldn't and was asking your sessionInfo().
Did read.DNAStringSet() finally work for you?

> Once I create a whole genome MaskedDNAString, I would like to use the 
> runAnalysis1 script in the GenomeSearching.pdf to analyze my input 
> dictionary.

Look at the runAnalysis2 script. I guess it's closer to what you are
trying to do (you have a dictionary of patterns, not a single pattern).
You'll need to make some modifications though e.g. use of countPDict
instead of matchPDict and store the results for each chromosome in a
list that you return to the caller at the end of the script. No need
to write the results to a file like in the vignette.


More information about the Bioc-sig-sequencing mailing list