[R] Count matches of a sequence in a vector?
David Winsemius
dwinsemius at comcast.net
Wed Apr 21 23:44:04 CEST 2010
On Apr 21, 2010, at 5:19 PM, William Dunlap wrote:
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Jeff Brown
>> Sent: Wednesday, April 21, 2010 8:08 AM
>> To: r-help at r-project.org
>> Subject: Re: [R] Count matches of a sequence in a vector?
>>
>>
>> This sort of calculation can't be vectorized; you'll have to
>> iterate through
>> the sequence, e.g. with a "for" loop. I don't know if a
>> routine has already
>> been written.
>
> It can be partially vectorized:
> f2 <- function (v, p) {
> retval <- TRUE
> i <- seq_len(length(v) - length(p) + 1L) - 1L
> for (j in seq_along(p)) {
> retval <- retval & v[i + j] == p[j]
> }
> retval
> }
I understood the task to be to count the number of matches so this
modification would do that:
> f2 <- function (v, p) {
+ retval <- 0
+ i <- seq_len(length(v) - length(p) + 1L) - 1L
+ for (j in seq_along(p)) {
+ retval <- v[i + j] == p[j] + retval
+ }
+ sum(retval)
+ }
> f2(v, vseq)
[1] 1
And that code also out paces the earlier one I offered , isn't
constrained to a length three pattern, and may be more memory
efficient, although the benchmark function does not provide feedback
on that aspect:
> benchmark(
+ logsum(v, vseq),
+ summatches(v,vseq),
+ sumroll(v,vseq), f2(v, vseq),
+ order=c('replications', 'elapsed'), replications=1000)
test replications elapsed relative user.self
sys.self user.child sys.child
4 f2(v, vseq) 1000 0.020 1.00 0.020
0.001 0 0
1 logsum(v, vseq) 1000 0.024 1.20 0.024
0.000 0 0
2 summatches(v, vseq) 1000 0.164 8.20 0.164
0.001 0 0
3 sumroll(v, vseq) 1000 1.023 51.15 1.024
0.005 0 0
> E.g., for the following data
> set.seed(1)
> v <- sample(1:10, size=1e6, replace=TRUE)
> p <- 2:4
> compare using zoo::rollapply (which loops over the long v)
> f1 <- function(v, p)rollapply(zoo(v), length(p), function(x)all(x==p))
> and f2 (which loops over the short p). I get
>
>> library(zoo)
>> system.time(r1 <- f1(v,p))
> user system elapsed
> 13.17 0.06 13.25
>> system.time(r2 <- f2(v,p))
> user system elapsed
> 0.12 0.00 0.12
>> identical(which(r1), which(r2))
> [1] TRUE
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> --
>> View this message in context:
>> http://n4.nabble.com/Count-matches-of-a-sequence-in-a-vector-t
> p2019018p2019108.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list