[BioC] matching sRNA sequences with whole data

Valerie Obenchain vobencha at fhcrc.org
Tue Aug 9 17:37:49 CEST 2011


Hi Konika,

The "Biostrings BSgenome Overview" link on this page is a great summary 
of string matching,

     http://bioconductor.org/help/course-materials/2011/BioC2011/

Specifically, I think the vmatchPattern() and matchPDict() functions 
will be most helpful to you.

Valerie


On 08/08/2011 04:25 AM, chawla wrote:
> Hi
> I want to know the faster method of obtaining the frequency of only 
> perfect matches between a data seq and seq target file
> both are set of nucleotide sequences but in large numbers.
> I tried
> for (i in 1:100)
> #for (i in 1:nrow(urfreq))
> {
> pos1<-which(glr4[,1]==urfreq[i,1])
> pos2<-which(glr5[,1]==urfreq[i,1])
> pos3<-which(glr6[,1]==urfreq[i,1])
> if(length(pos1>0))
>     {
>      urfreq[i,2]<-length(pos1)
>      }
> if(length(pos2>0))
>     {
>      urfreq[i,3]<-length(pos2)
>      }
> if(length(pos3>0))
>     {
>      urfreq[i,4]<-length(pos3)
>      }
>
> }
> Since the target datafile is huge , this piece of code take 22 min for 
> only 100 sequences , while I need to find frequency of over 3 million 
> sequences in the three samples data(glr 4 5 and 6).
> Is there any package/function for such matching.
> Thanks
> Konika
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list