[BioC] matching sRNA sequences with whole data
chawla
chawla at bio.ntnu.no
Mon Aug 8 13:25:59 CEST 2011
Hi
I want to know the faster method of obtaining the frequency of only
perfect matches between a data seq and seq target file
both are set of nucleotide sequences but in large numbers.
I tried
for (i in 1:100)
#for (i in 1:nrow(urfreq))
{
pos1<-which(glr4[,1]==urfreq[i,1])
pos2<-which(glr5[,1]==urfreq[i,1])
pos3<-which(glr6[,1]==urfreq[i,1])
if(length(pos1>0))
{
urfreq[i,2]<-length(pos1)
}
if(length(pos2>0))
{
urfreq[i,3]<-length(pos2)
}
if(length(pos3>0))
{
urfreq[i,4]<-length(pos3)
}
}
Since the target datafile is huge , this piece of code take 22 min for
only 100 sequences , while I need to find frequency of over 3 million
sequences in the three samples data(glr 4 5 and 6).
Is there any package/function for such matching.
Thanks
Konika
More information about the Bioconductor
mailing list