[Rd] the incredible lightness of crossprod

Thu Jan 27 19:51:26 CET 2005

Patrick Burns wrote:
> The following is at least as much out of intellectual curiosity
> as for practical reasons.
> On reviewing some code written by novices to R, I came
> across:
> 
> crossprod(x, y)[1,1]
> 
> I  thought, "That isn't a very S way of saying that,  I wonder
> what the penalty is for using 'crossprod'."  To my surprise the
> penalty was substantially negative.  Handily the client had S-PLUS
> as well -- there the sign of the penalty was as I had expected, but
> the order of magnitude was off.
> 
> Here are the timings of 1 million computations on vectors of
> length 1000.  This is under Windows, R version 1.9.1 and S-PLUS
> 6.2 (on the same machine).
> 
> Command                               R                        S-PLUS
> sum(x * y)                              28.61                        97.6
> crossprod(x, y)[1,1]                 6.77                     2256.2
> 
> 
> Another example is when computing the sums of the columns of a
> matrix.  For example:
> 
> set.seed(1)
> jjm <- matrix(rnorm(600), 5)
> 
> Timings for this under Windows 2000 with R version 2.0.1 (on an
> old chip running at about 0.7Ghz) for 100,000 computations are:
> 
> apply(jjm, 2, sum)               536.59
> colSums(jjm)                         18.26
> rep(1,5) %*% jjm                 15.41
> crossprod(rep(1,5), jjm)        13.16
> 
> (These timings seem to be stable across R versions and on at least
> one Linux platform.)
> 
> Andy Liaw showed another example of 'crossprod' being fast a couple
> days ago on R-help.
> 
> Questions for those with a more global picture of the code:
> 
> *  Is the speed advantage of 'crossprod' inherent, or is it because
> more care has been taken with its implementation than the other
> functions?
> 
> *  Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
> going to BLAS while 'sum' can't?

For a numeric matrix crossprod ends up calling level 3 BLAS; either 
dsyrk for the single argument case or dgemm for the two argument case. 
Especially in accelerated versions of the BLAS like Atlas or Goto's 
BLAS, those routines are hideously efficient and that's where you are 
seeing the big gain in speed.

By the way, you didn't mention if you had an accelerated BLAS installed 
with R.  Do you?