[Rd] the incredible lightness of crossprod
Patrick Burns
pburns at pburns.seanet.com
Thu Jan 27 16:25:30 CET 2005
The following is at least as much out of intellectual curiosity
as for practical reasons.
On reviewing some code written by novices to R, I came
across:
crossprod(x, y)[1,1]
I thought, "That isn't a very S way of saying that, I wonder
what the penalty is for using 'crossprod'." To my surprise the
penalty was substantially negative. Handily the client had S-PLUS
as well -- there the sign of the penalty was as I had expected, but
the order of magnitude was off.
Here are the timings of 1 million computations on vectors of
length 1000. This is under Windows, R version 1.9.1 and S-PLUS
6.2 (on the same machine).
Command R S-PLUS
sum(x * y) 28.61 97.6
crossprod(x, y)[1,1] 6.77 2256.2
Another example is when computing the sums of the columns of a
matrix. For example:
set.seed(1)
jjm <- matrix(rnorm(600), 5)
Timings for this under Windows 2000 with R version 2.0.1 (on an
old chip running at about 0.7Ghz) for 100,000 computations are:
apply(jjm, 2, sum) 536.59
colSums(jjm) 18.26
rep(1,5) %*% jjm 15.41
crossprod(rep(1,5), jjm) 13.16
(These timings seem to be stable across R versions and on at least
one Linux platform.)
Andy Liaw showed another example of 'crossprod' being fast a couple
days ago on R-help.
Questions for those with a more global picture of the code:
* Is the speed advantage of 'crossprod' inherent, or is it because
more care has been taken with its implementation than the other
functions?
* Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
going to BLAS while 'sum' can't?
* Would it make sense to (essentially) use 'crossprod' in
'colSums' and its friends at least for the special case of matrices?
Patrick Burns
Burns Statistics
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
More information about the R-devel
mailing list