[R] Faster way to implement this search?
William Dunlap
wdunlap at tibco.com
Sun Mar 18 21:48:12 CET 2012
> My current question is there a way to perform the same count, but with
> an arbitrary size pattern. In other words, instead of a fixed pattern
> size of 3, could I have a pattern size of 4, 5, 6, ..., 30 any of which
> that could be run without changing the script?
Of course you cannot do this without changing your script. However,
if you make a function out of it then you can change the function definition
to be more flexible and not have to change any calls to it.
Change your function from
f <- function(x, test.pattern) {
indx <- seq_len(length(x)-3) # 3 should be 2
sum((x[indx] == test.pattern[1]) & (x[indx+1] == test.pattern[2]) & (x[indx+2] == test.pattern[3]))
}
to
f <- function (x, test.pattern) {
if (length(x) < length(test.pattern)) {
0 # degenerate cases
} else {
indx <- seq_len(length(x) - length(test.pattern) + 1)
match <- x[indx] == test.pattern[1]
for (i in seq_len(length(test.pattern) - 1)) {
match <- match & x[indx + i] == test.pattern[1 + i]
}
sum(match)
}
}
Give the function a name that is meaningful and memorable to you
and use it instead of copying the idiom in it when you need to do a search.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Walter Anderson
> Sent: Saturday, March 17, 2012 5:56 AM
> To: Jeff Newmiller
> Cc: R Help
> Subject: Re: [R] Faster way to implement this search?
>
> On 03/17/2012 12:53 AM, Jeff Newmiller wrote:
> > for(indx in 1:(length(bin.05)-3))
> > >>> if ((bin.05[indx] == test.pattern[1])&& (bin.05[indx+1] ==
> > >>> test.pattern[2])&& (bin.05[indx+2] == test.pattern[3]))
> > >>> return.values$count.match.pattern[1] =
> > >>> return.values$count.match.pattern[1] + 1
> Ok, sorry for not understanding the first time, here is my example with
> the type of data I am working with in this simulation
>
> test.pattern <- c("T", "T", "O")
> bin.05 cut(runif(10000000), breaks=c(-0.01,0.05,1), labels=c("T",
> "O"))
> for(indx in 1:(length(bin.05)-3))
> if (
> (bin.05[indx] == test.pattern[1]) &&
> (bin.05[indx+1] == test.pattern[2]) &&
> (bin.05[indx+2] == test.pattern[3]))
> count <- count + 1
>
> Now the approach provided by William Dunlop sped up my simulation
> tremendously;
>
> indx <- seq_len(length(bin.05)-3)
> count <- sum((bin.05[indx] == test.pattern[1]) &
> (bin.05[indx+1] == test.pattern[2]) &
> (bin.05[indx+2] == test.pattern[3]))
>
> My current question is there a way to perform the same count, but with
> an arbitrary size pattern. In other words, instead of a fixed pattern
> size of 3, could I have a pattern size of 4, 5, 6, ..., 30 any of which
> that could be run without changing the script?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list