[R] Faster way to implement this search?
William Dunlap
wdunlap at tibco.com
Fri Mar 16 18:31:04 CET 2012
You didn't show your complete code but the following may help you speed things up.
Compare a function, f0, structured like your code and one, f1, that calls sum once
instead of counting length(x)-3 times.
f0 <- function(x, test.pattern) {
count <- 0
for(indx in seq_len(length(x)-3)) {
if ((x[indx] == test.pattern[1]) && (x[indx+1] == test.pattern[2]) && (x[indx+2] == test.pattern[3])) {
count <- count + 1
}
}
count
}
f1 <- function(x, test.pattern) {
indx <- seq_len(length(x)-3)
sum((x[indx] == test.pattern[1]) & (x[indx+1] == test.pattern[2]) & (x[indx+2] == test.pattern[3]))
}
> bin.05 <- round((log10(1:10000000)%%1e-3 - log10(1:10000000)%%1e-4) * 1e4) # quasi-random sample of 10^7 from {0,...,9}
> system.time(print(f0(bin.05, c(2,3,3))))
[1] 3194
user system elapsed
14.35 0.00 14.35
> system.time(print(f1(bin.05, c(2,3,3))))
[1] 3194
user system elapsed
0.70 0.21 0.90
You are probably also slowing things down by doing
yourList$yourCounts[1] <- yourList$yourCounts[1] + 1
many times instead of
count <- yourList$yourCounts[1]
once and
count <- count + 1
many times. The former evaluates $, [, $<-, and [<- many
times and the $<- and [<- in particular may use a fair bit of time.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Walter Anderson
> Sent: Friday, March 16, 2012 10:00 AM
> To: R Help
> Subject: [R] Faster way to implement this search?
>
> I am working on a simulation where I need to count the number of matches
> for an arbitrary pattern in a large sequence of binomial factors. My
> current code is
>
> for(indx in 1:(length(bin.05)-3))
> if ((bin.05[indx] == test.pattern[1]) && (bin.05[indx+1] ==
> test.pattern[2]) && (bin.05[indx+2] == test.pattern[3]))
> return.values$count.match.pattern[1] =
> return.values$count.match.pattern[1] + 1
>
> Since I am running the above code for each simulation multiple times on
> sequences of 10,000,000 factors the code is taking longer than I would
> like. Is there a better (more "R" way of achieving the same answer?
>
> Walter Anderson
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list