[BioC] matching sRNA sequences with whole data

chawla chawla at bio.ntnu.no
Mon Aug 8 13:25:59 CEST 2011


Hi
I want to know the faster method of obtaining the frequency of only 
perfect matches between a data seq and seq target file
both are set of nucleotide sequences but in large numbers.
I tried
for (i in 1:100)
#for (i in 1:nrow(urfreq))
{
pos1<-which(glr4[,1]==urfreq[i,1])
pos2<-which(glr5[,1]==urfreq[i,1])
pos3<-which(glr6[,1]==urfreq[i,1])
if(length(pos1>0))
     {
      urfreq[i,2]<-length(pos1)
      }
if(length(pos2>0))
     {
      urfreq[i,3]<-length(pos2)
      }
if(length(pos3>0))
     {
      urfreq[i,4]<-length(pos3)
      }

}
Since the target datafile is huge , this piece of code take 22 min for 
only 100 sequences , while I need to find frequency of over 3 million 
sequences in the three samples data(glr 4 5 and 6).
Is there any package/function for such matching.
Thanks
Konika



More information about the Bioconductor mailing list