[BioC] advice on Biostrings
Rafael A Irizarry
ririzarr at jhsph.edu
Tue Feb 21 22:19:20 CET 2006
hi im using biostrings to count base content as well as pair of bases
content. im using the following sniped of code:
###pmseq is a vector of character strings (not of the same nchar).
tmp <- sapply(pmseq,function(x){
y = DNAString(x)
c(alphabetFrequency(y)[2:5], ##count A,T,G,C
length(matchDNAPattern("GC",y))+length(matchDNAPattern("CG",y)))
##count GC or CG
})
it is painfully slow. strsplit and grep were much faster for the first
part (counting bases) but the using grep for the second part was not
straight forward.
any suggestions?
-r
More information about the Bioconductor
mailing list