[R] apply is slower than for loop?

Allan Engelhardt allane at cybaea.com
Sat Jul 10 15:44:11 CEST 2010



On 09/07/10 21:19, Duncan Murdoch wrote:
> On 09/07/2010 4:11 PM, Gene Leynes wrote:
>> I thought the "apply" functions are faster than for loops, but my most
>> recent test shows that apply actually takes a significantly longer 
>> than a
>> for loop.  Am I missing something?
>
> Probably not.  apply() needs to figure out the shape of the results it 
> gets from each row in order to put them into the final result matrix.  
> You know that in advance, and set up the result to hold them, so your 
> calculation would be more efficient.

Plus, in the way it is set up, apply has to make a much larger temporary 
matrix.  If you do

ds1<- 1+ds
library("rbenchmark")
benchmark(apply=apply(1+ds, 1, cumprod),
           forloop=for (i in 1:nrow(ds)){ y2[i,]<-cumprod(1+ds[i,]) },
           apply2=apply(ds1, 1, cumprod),
           replications=1, columns=c("test","elapsed","relative","user.self","sys.self"), order="elapsed")
#      test elapsed relative user.self sys.self
# 2 forloop   1.863 1.000000     1.861    0.000
# 3  apply2   2.175 1.167472     1.934    0.239
# 1   apply   2.443 1.311326     2.108    0.334



you can sense some of the impact of that.

But if it is speed you are after, sapply may be even faster (you'll need 
to t() the result again):

benchmark(forloop=for (i in 1:nrow(ds)){ y2[i,]<-cumprod(1+ds[i,]) },
           sapply={dsone<-1+ds;sapply(1:NROW(dsone), function(i) cumprod(dsone[i,]))},
           replications=1, columns=c("test","elapsed","relative","user.self","sys.self"), order="elapsed")
#      test elapsed relative user.self sys.self
# 2  sapply   1.539 1.000000     1.300    0.239
# 1 forloop   1.878 1.220273     1.878    0.000
zz<- sapply(1:NROW(ds1), function(i) cumprod(dsone[i,]))
identical(t(zz), y2)
# [1] TRUE


Hope this helps a little.

Allan

>
> The *apply functions are designed to be convenient and clear to read, 
> not necessarily fast.
>
> Duncan Murdoch
>
>> It doesn't matter much if I do column wise calculations rather than 
>> row wise
>>
>> ## Example of how apply is SLOWER than for loop:
>>
>> #rm(list=ls())
>>
>> ## DEFINE VARIABLES
>> mu=0.05 ; sigma=0.20 ; dt=.25 ; T=50 ; sims=1e5
>> timesteps = T/dt
>>
>> ## MAKE PHI AND DS
>> phi = matrix(rnorm(timesteps*sims), nrow=sims, ncol=timesteps)
>> ds = mu*dt + sigma * sqrt(dt) * phi
>>
>> ## USE APPLY TO CALCULATE ROWWISE CUMULATIVE PRODUCT
>> system.time(y1 <- apply(1+ds, 1, cumprod))
>> ## UNTRANSFORM Y1, BECAUSE ROW APPLY FLIPS THE MATRIX
>> y1=t(y1)
>>
>> ## USE FOR LOOP TO CALCULATE ROWWISE CUMULATIVE PRODUCT
>> y2=matrix(NA,nrow(ds),ncol(ds))
>> system.time(
>>     for (i in 1:nrow(ds)){
>>         y2[i,]<-cumprod(1+ds[i,])
>>     }
>> )
>>
>> ## COMPARE RESULTS TO MAKE SURE THEY DID THE SAME THING
>> str(y1)
>> str(y2)
>> all(y1==y2)
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list