[R] Faster way to implement this search?

Walter Anderson wandrson01 at gmail.com
Sat Mar 17 13:56:29 CET 2012


On 03/17/2012 12:53 AM, Jeff Newmiller wrote:
>      for(indx in 1:(length(bin.05)-3))
> >>>         if ((bin.05[indx] == test.pattern[1])&&   (bin.05[indx+1] ==
> >>>  test.pattern[2])&&   (bin.05[indx+2] == test.pattern[3]))
> >>>           return.values$count.match.pattern[1] =
> >>>  return.values$count.match.pattern[1] + 1
Ok, sorry for not understanding the first time, here is my example with 
the type of data I am working with in this simulation

      test.pattern <- c("T", "T", "O")
      bin.05 cut(runif(10000000), breaks=c(-0.01,0.05,1), labels=c("T", 
"O"))
      for(indx in 1:(length(bin.05)-3))
         if (
             (bin.05[indx] == test.pattern[1]) &&
             (bin.05[indx+1] == test.pattern[2]) &&
             (bin.05[indx+2] == test.pattern[3]))
                 count <- count + 1

Now the approach provided by William Dunlop sped up my simulation 
tremendously;

indx <- seq_len(length(bin.05)-3)
count <- sum((bin.05[indx] == test.pattern[1]) &
                        (bin.05[indx+1] == test.pattern[2]) &
                        (bin.05[indx+2] == test.pattern[3]))

My current question is there a way to perform the same count, but with 
an arbitrary size pattern.  In other words, instead of a fixed pattern 
size of 3, could I have a pattern size of 4, 5, 6, ..., 30 any of which 
that could be run without changing the script?



More information about the R-help mailing list