[BioC] matching sRNA sequences with whole data
Valerie Obenchain
vobencha at fhcrc.org
Tue Aug 9 17:37:49 CEST 2011
Hi Konika,
The "Biostrings BSgenome Overview" link on this page is a great summary
of string matching,
http://bioconductor.org/help/course-materials/2011/BioC2011/
Specifically, I think the vmatchPattern() and matchPDict() functions
will be most helpful to you.
Valerie
On 08/08/2011 04:25 AM, chawla wrote:
> Hi
> I want to know the faster method of obtaining the frequency of only
> perfect matches between a data seq and seq target file
> both are set of nucleotide sequences but in large numbers.
> I tried
> for (i in 1:100)
> #for (i in 1:nrow(urfreq))
> {
> pos1<-which(glr4[,1]==urfreq[i,1])
> pos2<-which(glr5[,1]==urfreq[i,1])
> pos3<-which(glr6[,1]==urfreq[i,1])
> if(length(pos1>0))
> {
> urfreq[i,2]<-length(pos1)
> }
> if(length(pos2>0))
> {
> urfreq[i,3]<-length(pos2)
> }
> if(length(pos3>0))
> {
> urfreq[i,4]<-length(pos3)
> }
>
> }
> Since the target datafile is huge , this piece of code take 22 min for
> only 100 sequences , while I need to find frequency of over 3 million
> sequences in the three samples data(glr 4 5 and 6).
> Is there any package/function for such matching.
> Thanks
> Konika
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list