[Rd] the incredible lightness of crossprod

Paul Gilbert pgilbert at bank-banque-canada.ca
Thu Jan 27 20:53:17 CET 2005


A few weeks ago I noticed

 >  z <- matrix(rnorm(20000),10000,2)

 > system.time(for (i in 1:1000) apply(z,2,sum))
[1] 13.44  0.48 14.08  0.00  0.00

 > system.time(for (i in 1:1000) rep(1,10000) %*% z)
[1] 6.46 0.11 6.84 0.00 0.00

which seemed  completely contrary to all my childhood teachings. Now

 > system.time(for (i in 1:1000) crossprod(rep(1,10000), z))
[1] 1.90 0.12 2.24 0.00 0.00

makes sense because it is suppose to be faster than %*% , but why is 
apply so slow?
(And should I go back and change apply in my code everywhere or is this 
likely to reverse again?)

Paul Gilbert
 

Patrick Burns wrote:

> The following is at least as much out of intellectual curiosity
> as for practical reasons.
> On reviewing some code written by novices to R, I came
> across:
>
> crossprod(x, y)[1,1]
>
> I  thought, "That isn't a very S way of saying that,  I wonder
> what the penalty is for using 'crossprod'."  To my surprise the
> penalty was substantially negative.  Handily the client had S-PLUS
> as well -- there the sign of the penalty was as I had expected, but
> the order of magnitude was off.
>
> Here are the timings of 1 million computations on vectors of
> length 1000.  This is under Windows, R version 1.9.1 and S-PLUS
> 6.2 (on the same machine).
>
> Command                               R                        S-PLUS
> sum(x * y)                              28.61                        97.6
> crossprod(x, y)[1,1]                 6.77                     2256.2
>
>
> Another example is when computing the sums of the columns of a
> matrix.  For example:
>
> set.seed(1)
> jjm <- matrix(rnorm(600), 5)
>
> Timings for this under Windows 2000 with R version 2.0.1 (on an
> old chip running at about 0.7Ghz) for 100,000 computations are:
>
> apply(jjm, 2, sum)               536.59
> colSums(jjm)                         18.26
> rep(1,5) %*% jjm                 15.41
> crossprod(rep(1,5), jjm)        13.16
>
> (These timings seem to be stable across R versions and on at least
> one Linux platform.)
>
> Andy Liaw showed another example of 'crossprod' being fast a couple
> days ago on R-help.
>
> Questions for those with a more global picture of the code:
>
> *  Is the speed advantage of 'crossprod' inherent, or is it because
> more care has been taken with its implementation than the other
> functions?
>
> *  Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
> going to BLAS while 'sum' can't?
>
> *  Would it make sense to (essentially) use 'crossprod' in
> 'colSums' and its friends at least for the special case of matrices?
>
> Patrick Burns
>
> Burns Statistics
> patrick at burns-stat.com
> +44 (0)20 8525 0696
> http://www.burns-stat.com
> (home of S Poetry and "A Guide for the Unwilling S User")
>
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list