[R] Search for locations of subsequences?
Rui Barradas
ruipbarradas at sapo.pt
Wed Aug 29 13:26:21 CEST 2012
Hello,
Function Biostrings::matchPattern can be called with an algorithm =
"boyer-moore" argument.
I've never used it, this is the return value of
library(sos)
r1 <- findFn('boyer')
r2 <- findFn('moore')
r1 & r2
I have implemented the Boyer-Moore algorithm a couple of times, the
first(!) of all in 8086 assembly, but I'm seeing a difficulty regarding
your original request.
A Boyer-Moore algorithm to search for subsequences of character vectors
all of which such that nchar(x) is 1 should be very easy to implement
using the .Call interface, but for integer vectors I am not seeing how
to implement the bad character shift table. What would be the alphabet?
The set of 32-bit integers? In this case the table length would be
prohibitive...
Ideas anyone?
Rui Barradas
Em 28-08-2012 22:05, Duncan Murdoch escreveu:
> Is there a function to efficiently search for a subsequence within a
> vector?
>
> For example, with
>
> x <- 1:100
>
> I'd like to search for the sequence c(49,50,51), and be told that it
> occurs exactly once, starting at location 49. (The items in the
> vectors might be numeric or character, and there might be repetitions
> within the search pattern or within the vector I'm searching.)
>
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list