[R] use sliding window to count substrings found in large string

Immanuel mane.desk at googlemail.com
Wed Jul 7 18:25:40 CEST 2010


Hello together,


I'm looking for advice on how to do some tests on strings.
What I want to do is the following:

(just an example, real strings/sequence are about 200-400 characters long)
given set of Strings:

String1 abcdefgh
String2 bcdefgop

use a sliding window of size x  to create an vector of all subsequences
of size x
found in the set (order matters! ).

Now create, for every string in the set, an vector containing the counts
on how often
each subsequence was found in this particular string.

 It would be great if someone could give me a vague outline on how to
start and which methods to work.
I did read through the man pages and goggled a lot, but still don't know
how to
approach this.

best regards,
Immanuel



More information about the R-help mailing list