[Rd] Speeding up sum and prod
Radford Neal
radford at cs.toronto.edu
Mon Aug 23 19:19:01 CEST 2010
Looking for more ways to speed up R, I've found that large
improvements are possible in the speed of "sum" and "prod" for long
real vectors.
Here is a little test with R version 2.11.1 on an Intel Linux system
> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
user system elapsed
4.800 0.010 4.817
> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
user system elapsed
8.240 0.030 8.269
and here is the same with "sum" and "prod" modified as described below:
> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
user system elapsed
1.81 0.00 1.81
> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
user system elapsed
7.250 0.010 7.259
That's an improvement by a factor of 2.65 for real vectors of length
1000 with na.rm=FALSE (the default), and an improvement of 12% when
na.rm=TRUE. Of course, the improvement is smaller for very short
vectors.
The biggest reason for the improvement is that the current code (in
2.11.1 and in the development release of 2010-08-19) makes a costly
call of ISNAN even when the option is na.rm=FALSE. The inner loop
can also be sped up a bit in other respects.
Here is the old procedure, in src/main/summary.c:
static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
LDOUBLE s = 0.0;
int i;
Rboolean updated = FALSE;
for (i = 0; i < n; i++) {
if (!ISNAN(x[i]) || !narm) {
if(!updated) updated = TRUE;
s += x[i];
}
}
*value = s;
return(updated);
}
and here is my modified version:
static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
LDOUBLE s = 0.0;
int i;
Rboolean updated = FALSE;
if (narm) {
for (i = 0; i < n; i++) {
if (!ISNAN(x[i])) {
s += x[i];
updated = TRUE;
break;
}
}
for (i = i+1; i < n; i++) {
if (!ISNAN(x[i]))
s += x[i];
}
} else {
for (i = 0; i < n; i++)
s += x[i];
if (n>0) updated = TRUE;
}
*value = s;
return(updated);
}
An entirely analogous improvement can be made to the "prod" function.
Radford Neal
More information about the R-devel
mailing list