performance of apply
Douglas Bates
bates@stat.wisc.edu
29 May 1998 09:05:40 -0500
Andreas Weingessel <Andreas.Weingessel@ci.tuwien.ac.at> writes:
> I noticed that apply is VERY SLOW when applied to a "large"
> dimension as for example when computing the row sums of a matrix with
> thousands of rows.
>
> To demonstrate it, I did some benchmarking for different methods of
> computing the row sums of an nx10 matrix with n =3D 2000, ..., 10000.
>
> The first method (M1) I used is the normal apply command:
> y <- apply(x,1,sum)
> The second method (M2) uses a for-loop for the computations, where the
> memory for the resulting vector has been allocated before. That is, for
> n=3D2000:
> z <- numeric(2000); for (i in 1:2000) z[i] <- sum(x[i,])
> The third method (M3) also uses a for-loop, but the resulting vector
> is built recursively, i.e.
> z1 <- NULL; for (i in 1:2000) z1 <- c(z1,sum(x[i,]))
>
> All computations have been made on a Pentium II 233MHz, 256MB, R
> started as R -v 50. The following table shows the minimum, mean, and
> maximum CPU-time in seconds as measured by system.time over 10 runs
> for every computation for different values of n.
>
> n M1 M2 M3
> 2000 4.03 4.16 4.34 0.27 0.40 0.47 0.51 0.63 0.71
> 4000 12.65 13.40 14.68 0.73 0.81 0.94 1.78 1.86 1.98
> 6000 26.51 28.14 29.50 1.19 1.22 1.38 3.79 3.80 3.80
> 8000 52.06 63.43 67.61 1.46 1.63 1.69 6.38 6.41 6.58
> 10000 84.06 98.17 118.94 1.93 2.01 2.13 9.78 9.79 9.81
>
> That is, the computation of the sums of the rows of a 10000x10 matrix
> with apply takes about 100sec on average, where a simple for-loop does
> the same job in about 2sec.
There would be at least two other methods that would be interesting to
try. Because you are interested in the row sums I imagine by far the
fastest method would be (M4)
y <- x %*% rep(1, ncol(x))
You may also find that (M5)
y <- apply( t(x), 2, sum )
would be faster than M1 because of the way the R handles arrays.
We should keep in mind when looking at these tables that the maximum
time on the size 10000 case is about 2 minutes. If the computation is
worth doing it may be worth waiting 2 minutes for the result. I
remember the days of running S version 2 (i.e. the version before "New
S") on a Vax-11/750. This sort of computation could take many, many
hours on the only computer in the department so relative differences
in speed for different methods were a lot more important. One got
used to rephrasing computations in "efficient" ways. Today I think
that clarity is usually more important than efficiency.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._