Thank you very much.

I'll check it out tonight. In the mean time, I'll have to set up R 2.7.0.

Thanks again.

On Tue, May 13, 2008 at 10:05 AM, Herve Pages <hpages@fhcrc.org> wrote:

> Hi Anh,
>
> I fixed a few things in Biostrings:
>
> - The "as.character" method for XStringViews (and XStringSet) objects is
> much
>  faster now (about 250x faster than yesterday for your use case):
>
>    library(BSgenome.Hsapiens.UCSC.hg18)
>    c1 <- Hsapiens$chr1
>    m <- matchPattern('CTAG', c1)
>    mgaps <- gaps(m)
>    mgaps2 <- as.character(mgaps)  # now takes about 5 sec. instead of 20
> minutes
>
> - As a consequence, the getSeq() function is much faster too (it uses
> as.character()
>  internally):
>
>    mgaps3_start <- end(m)[-length(m)] + 1L
>    mgaps3_end <- start(m)[-1] - 1L
>    ## The empty gaps must be dropped because the 'start' and 'end' args of
> getSeq()
>    ## must verify 'start' <= 'end'
>    nonemptygaps <- mgaps3_start <= mgaps3_end
>    mgaps3_start <- mgaps3_start[nonemptygaps]
>    mgaps3_end <- mgaps3_end[nonemptygaps]
>    mgaps3 <- getSeq(Hsapiens, 'chr1', start=mgaps3_start, end=mgaps3_end)
>
>    identical(mgaps3, mgaps2)  # TRUE
>
> This will be available in BioC 2.2 (Biostrings 2.8.5) tonight around
> midnight
> (Seattle time) for R-2.7 users and in BioC 2.3 (Biostrings 2.9.8) around
> noon
> (Seattle time). As always, install with biocLite().
>
> Please let me know by posting here if you run into other problems. Thanks!
>
> H.
>
>
>
> hpages@fhcrc.org wrote:
>
> > Hi Anh,
> >
> > Quoting Anh Tran <popophobia@gmail.com>:
> >
> >  Hi all,
> > >
> > > I'm new to this mailing list and have a very simple question about
> > > digestion
> > > with restriction enzyme for the whole genome.
> > >
> > > I'm using package BSgenome.Hsapiens.UCSC.hg18 to find the cut site of
> > > enzyme. Is there a faster way (ie. pre-built package) to get the
> > > fragment
> > > sequence and it's location in UCSC genome browser format (Chr#: start
> > > bp -
> > > end bp).
> > >
> > > I'm planning on finding the restriction site location, then find the
> > > substring between two consecutive cut sites.
> > >
> > > Is there any faster way to do this?
> > >
> > > Here's the code that I'm thinking of:
> > >
> > > library(BSgenome.Hsapiens.UCSC.hg18)
> > > c1<-Hsapiens$chr1
> > > m<-matchPatterns('CTAG', c1)
> > >
> > > #And the for loop
> > > reslt<-NULL
> > > for (i in 1:(length(m)-1)) {
> > >   reslt<-rbind(reslt, cbind(start(m[i])+1, start(m[i+1])-3,
> > > getSeq(Hsapiens, 'chr1', start(m[i])+2, start(m[i+1])-2)))
> > > }
> > >
> > > It seems like this code is no where near perfect. Please give me some
> > > input.
> > >
> >
> > Yes looping on an 623303-element vector at the R level will take a long
> > time. To "invert" the views of an XStringViews object i.e. to make the
> > current gaps between the views become the new views and the current
> > views
> > become the new gaps, just do:
> >
> >  mgaps <- gaps(m)
> >
> > This is very fast! (< 0.3 sec. on my machine)
> >
> > Then if you really want this as a standard character vector, do:
> >
> >  mgaps2 <- as.character(mgaps)
> >
> > but, unfortunately this is still very slow at the moment because the
> > main loop in the "as.character" method for XStringViews objects is still
> > written in R and not in C (we need to change this ASAP).
> >
> > You could also use the getSeq() function in a vectorized way if you
> > really wanted to use getSeq(). Then your code would look like:
> >
> >  reslt_start <- end(m)[-length(m)] + 1L
> >  reslt_end <- start(m)[-1] - 1L
> >  reslt <- getSeq(Hsapiens, 'chr1', start(m)[i]+2, start(m)[i+1]-2))
> >
> > at least in theory... but I realized that there are a few problems
> > with the current getSeq() implementation that I need to address too,
> > sorry!
> >
> > Note that you'll need the latest available version of Biostrings in
> > order
> > to use the gaps() function.
> >
> > Cheers,
> > H.
> >
> >
> >
> > > Thanks
> > >
> > > --
> > > Regards,
> > > Anh Tran
> > >
> > >    [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor@stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> > >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
>


-- 
Regards,
Anh Tran

	[[alternative HTML version deleted]]

