[Rd] Speeding up sum and prod

Mon Aug 23 20:54:46 CEST 2010

On Aug 23, 2010, at 1:19 PM, Radford Neal wrote:

> Looking for more ways to speed up R, I've found that large
> improvements are possible in the speed of "sum" and "prod" for long
> real vectors.  
> 

The results are likely very compiler- and architecture specific. On my machine [x86_64, OS X 10.6] your version is actually slower for narm (the more you optimize the more assumptions you are making which may turn out to be false):

baseline:
> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
   user  system elapsed 
  2.412   0.018   2.431 
> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
   user  system elapsed 
  2.510   0.001   2.509 

RN version:
> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
   user  system elapsed 
  1.691   0.004   1.694 
> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
   user  system elapsed 
  3.527   0.001   3.528 

If you simply take out the narm check and don't mess with updated you get to a more reasonable:
> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
   user  system elapsed 
  1.688   0.003   1.691 
> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
   user  system elapsed 
  2.522   0.000   2.522 

But just to bring things into perspective -- simply changing the compiler [here just for that one file with still the same optimization level] will give you:
> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
   user  system elapsed 
  5.098   0.003   5.102 
> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
   user  system elapsed 
  5.213   0.000   5.214 

... and using the original (unmodified) R code with a more optimized flags will give you
> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
   user  system elapsed 
  1.835   0.003   1.838 
> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
   user  system elapsed 
  2.473   0.001   2.474 

... and the "optimal" version above (3rd) with the same optimization settings:

> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
   user  system elapsed 
  1.670   0.003   1.673 
> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
   user  system elapsed 
  5.555   0.001   5.556 

... so you just can't win ... 

Cheers,
Simon

> Here is a little test with R version 2.11.1 on an Intel Linux system
> 
>> a <- seq(0,1,length=1000)
>> system.time({for (i in 1:1000000) b <- sum(a)})
>   user  system elapsed
>  4.800   0.010   4.817
>> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
>   user  system elapsed
>  8.240   0.030   8.269
> 
> and here is the same with "sum" and "prod" modified as described below:
> 
>> a <- seq(0,1,length=1000)
>> system.time({for (i in 1:1000000) b <- sum(a)})
>   user  system elapsed
>   1.81    0.00    1.81
>> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
>   user  system elapsed
>  7.250   0.010   7.259
> 
> That's an improvement by a factor of 2.65 for real vectors of length
> 1000 with na.rm=FALSE (the default), and an improvement of 12% when
> na.rm=TRUE.  Of course, the improvement is smaller for very short
> vectors.
> 
> The biggest reason for the improvement is that the current code (in
> 2.11.1 and in the development release of 2010-08-19) makes a costly
> call of ISNAN even when the option is na.rm=FALSE.  The inner loop
> can also be sped up a bit in other respects.
> 
> Here is the old procedure, in src/main/summary.c:
> 
> static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
> {
>    LDOUBLE s = 0.0;
>    int i;
>    Rboolean updated = FALSE;
> 
>    for (i = 0; i < n; i++) {
>        if (!ISNAN(x[i]) || !narm) {
>            if(!updated) updated = TRUE;
>            s += x[i];
>        }
>    }
>    *value = s;
> 
>    return(updated);
> }
> 
> and here is my modified version:
> 
> static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
> {
>    LDOUBLE s = 0.0;
>    int i;
>    Rboolean updated = FALSE;
> 
>    if (narm) {
>        for (i = 0; i < n; i++) {
>            if (!ISNAN(x[i])) {
>                s += x[i];
>                updated = TRUE;
>                break;
>            }
>        }
>        for (i = i+1; i < n; i++) {
>            if (!ISNAN(x[i]))
>                s += x[i];
>        }
>    } else {
>        for (i = 0; i < n; i++)
>            s += x[i];
>        if (n>0) updated = TRUE;
>    }
> 
>    *value = s;
> 
>    return(updated);
> }
> 
> An entirely analogous improvement can be made to the "prod" function.
> 
>   Radford Neal
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>