[R] Faster way to implement this search?
Walter Anderson
wandrson01 at gmail.com
Fri Mar 16 23:41:56 CET 2012
On 03/16/2012 12:31 PM, William Dunlap wrote:
> You didn't show your complete code but the following may help you speed things up.
> Compare a function, f0, structured like your code and one, f1, that calls sum once
> instead of counting length(x)-3 times.
>
> f0<- function(x, test.pattern) {
> count<- 0
> for(indx in seq_len(length(x)-3)) {
> if ((x[indx] == test.pattern[1])&& (x[indx+1] == test.pattern[2])&& (x[indx+2] == test.pattern[3])) {
> count<- count + 1
> }
> }
> count
> }
>
> f1<- function(x, test.pattern) {
> indx<- seq_len(length(x)-3)
> sum((x[indx] == test.pattern[1])& (x[indx+1] == test.pattern[2])& (x[indx+2] == test.pattern[3]))
> }
>
>
>> bin.05<- round((log10(1:10000000)%%1e-3 - log10(1:10000000)%%1e-4) * 1e4) # quasi-random sample of 10^7 from {0,...,9}
>> system.time(print(f0(bin.05, c(2,3,3))))
> [1] 3194
> user system elapsed
> 14.35 0.00 14.35
>> system.time(print(f1(bin.05, c(2,3,3))))
> [1] 3194
> user system elapsed
> 0.70 0.21 0.90
>
> You are probably also slowing things down by doing
> yourList$yourCounts[1]<- yourList$yourCounts[1] + 1
> many times instead of
> count<- yourList$yourCounts[1]
> once and
> count<- count + 1
> many times. The former evaluates $, [, $<-, and [<- many
> times and the $<- and [<- in particular may use a fair bit of time.
>
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
>> Of Walter Anderson
>> Sent: Friday, March 16, 2012 10:00 AM
>> To: R Help
>> Subject: [R] Faster way to implement this search?
>>
>> I am working on a simulation where I need to count the number of matches
>> for an arbitrary pattern in a large sequence of binomial factors. My
>> current code is
>>
>> for(indx in 1:(length(bin.05)-3))
>> if ((bin.05[indx] == test.pattern[1])&& (bin.05[indx+1] ==
>> test.pattern[2])&& (bin.05[indx+2] == test.pattern[3]))
>> return.values$count.match.pattern[1] =
>> return.values$count.match.pattern[1] + 1
>>
>> Since I am running the above code for each simulation multiple times on
>> sequences of 10,000,000 factors the code is taking longer than I would
>> like. Is there a better (more "R" way of achieving the same answer?
>>
>> Walter Anderson
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
Thank you for this response. That made a huge speed improvement in my
simulation speed!
More information about the R-help
mailing list