[R] Count matches of a sequence in a vector?

David Winsemius dwinsemius at comcast.net
Wed Apr 21 23:44:04 CEST 2010


On Apr 21, 2010, at 5:19 PM, William Dunlap wrote:

>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Jeff Brown
>> Sent: Wednesday, April 21, 2010 8:08 AM
>> To: r-help at r-project.org
>> Subject: Re: [R] Count matches of a sequence in a vector?
>>
>>
>> This sort of calculation can't be vectorized; you'll have to
>> iterate through
>> the sequence, e.g. with a "for" loop.  I don't know if a
>> routine has already
>> been written.
>
> It can be partially vectorized:
> f2 <- function (v, p) {
>    retval <- TRUE
>    i <- seq_len(length(v) - length(p) + 1L) - 1L
>    for (j in seq_along(p)) {
>        retval <- retval & v[i + j] == p[j]
>    }
>    retval
> }

I understood the task to be to count the number of matches so this  
modification would do that:

 > f2 <- function (v, p) {
+    retval <- 0
+    i <- seq_len(length(v) - length(p) + 1L) - 1L
+    for (j in seq_along(p)) {
+        retval <- v[i + j] == p[j] + retval
+    }
+    sum(retval)
+ }
 > f2(v, vseq)
[1] 1

And that code also out paces the earlier one I offered , isn't  
constrained to a length three pattern,  and may be more memory  
efficient, although the benchmark function does not provide feedback  
on that aspect:

 > benchmark(
+    logsum(v, vseq),
+    summatches(v,vseq),
+    sumroll(v,vseq), f2(v, vseq),
+    order=c('replications', 'elapsed'), replications=1000)
                  test replications elapsed relative user.self  
sys.self user.child sys.child
4         f2(v, vseq)         1000   0.020     1.00     0.020     
0.001          0         0
1     logsum(v, vseq)         1000   0.024     1.20     0.024     
0.000          0         0
2 summatches(v, vseq)         1000   0.164     8.20     0.164     
0.001          0         0
3    sumroll(v, vseq)         1000   1.023    51.15     1.024     
0.005          0         0

> E.g., for the following data
> set.seed(1)
> v <- sample(1:10, size=1e6, replace=TRUE)
> p <- 2:4
> compare using zoo::rollapply (which loops over the long v)
> f1 <- function(v, p)rollapply(zoo(v), length(p), function(x)all(x==p))
> and f2 (which loops over the short p).  I get
>
>> library(zoo)
>> system.time(r1 <- f1(v,p))
>    user  system elapsed
>   13.17    0.06   13.25
>> system.time(r2 <- f2(v,p))
>    user  system elapsed
>    0.12    0.00    0.12
>> identical(which(r1), which(r2))
> [1] TRUE
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -- 
>> View this message in context:
>> http://n4.nabble.com/Count-matches-of-a-sequence-in-a-vector-t
> p2019018p2019108.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list