[R] Search for locations of subsequences?

Rui Barradas ruipbarradas at sapo.pt
Wed Aug 29 13:26:21 CEST 2012


Hello,

Function Biostrings::matchPattern can be called with an algorithm = 
"boyer-moore" argument.
I've never used it, this is the return value of

library(sos)
r1 <- findFn('boyer')
r2 <- findFn('moore')
r1 & r2

I have implemented the Boyer-Moore algorithm a couple of times, the 
first(!) of all in 8086 assembly, but I'm seeing a difficulty regarding 
your original request.
A Boyer-Moore algorithm to search for subsequences of character vectors 
all of which such that nchar(x) is 1 should be very easy to implement 
using the .Call interface, but for integer vectors I am not seeing how 
to implement the bad character shift table. What would be the alphabet? 
The set of 32-bit integers? In this case the table length would be 
prohibitive...

Ideas anyone?

Rui Barradas

Em 28-08-2012 22:05, Duncan Murdoch escreveu:
> Is there a function to efficiently search for a subsequence within a 
> vector?
>
> For example, with
>
> x <- 1:100
>
> I'd like to search for the sequence c(49,50,51), and be told that it 
> occurs exactly once, starting at location 49.  (The items in the 
> vectors might be numeric or character, and there might be repetitions 
> within the search pattern or within the vector I'm searching.)
>
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list