[R] Faster text search in document database than with grep?
Witold E Wolski
wewolski at gmail.com
Mon Aug 3 11:25:23 CEST 2015
I have a database of text documents (letter sequences). Several thousands
of documents with approx. 1000-2000 letters each.
I need to find exact matches of short 3-15 letters sequences in those
documents.
Without any regexp patterns the search of one 3-15 letter "words" takes in
the order of 1s.
So for a database with several thousand documents it's an the order of
hours.
The naive approach would be to use mcmapply, but than on a standard
hardware I am still in the same order and since R is an interactive
programming environment this isn't a solution I would go for.
But aren't there faster algorithmic solutions? Can anyone point me please
to an implementation available in R.
Thank you
Witold
--
Witold Eryk Wolski
[[alternative HTML version deleted]]
More information about the R-help
mailing list