[Rd] the incredible lightness of crossprod

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jan 27 21:23:39 CET 2005


On Thu, 27 Jan 2005, Paul Gilbert wrote:

> A few weeks ago I noticed
>
>>  z <- matrix(rnorm(20000),10000,2)
>
>> system.time(for (i in 1:1000) apply(z,2,sum))
> [1] 13.44  0.48 14.08  0.00  0.00
>
>> system.time(for (i in 1:1000) rep(1,10000) %*% z)
> [1] 6.46 0.11 6.84 0.00 0.00

So both run in a few milliseconds for rather large problems.

> which seemed  completely contrary to all my childhood teachings. Now
>
>> system.time(for (i in 1:1000) crossprod(rep(1,10000), z))
> [1] 1.90 0.12 2.24 0.00 0.00
>
> makes sense because it is suppose to be faster than %*% , but why is apply so 
> slow?

`so slow' sic: what are you going to do in the 7ms you saved?

> (And should I go back and change apply in my code everywhere or is this 
> likely to reverse again?)

It's not likely.  apply is an R-level loop, and  %*% is a C-level one.
However,  %*% is not supposed to be much slower than crossprod, and the 
devil is in the details of how the BLAS is implemented: the code is very 
similar.

That %*% was faster than apply has been true in all my (17 years) of S/R 
experience.  Your childhood may predate S3, of course.

I still think one should use row/colSums for clarity with acceptable
efficiency.  It must be very unusual for these operations to be a dominant 
part of a calculation, so let's not lose proportion here.


> Paul Gilbert
>
>
> Patrick Burns wrote:
>
>> The following is at least as much out of intellectual curiosity
>> as for practical reasons.
>> On reviewing some code written by novices to R, I came
>> across:
>> 
>> crossprod(x, y)[1,1]
>> 
>> I  thought, "That isn't a very S way of saying that,  I wonder
>> what the penalty is for using 'crossprod'."  To my surprise the
>> penalty was substantially negative.  Handily the client had S-PLUS
>> as well -- there the sign of the penalty was as I had expected, but
>> the order of magnitude was off.
>> 
>> Here are the timings of 1 million computations on vectors of
>> length 1000.  This is under Windows, R version 1.9.1 and S-PLUS
>> 6.2 (on the same machine).
>> 
>> Command                               R                        S-PLUS
>> sum(x * y)                              28.61                        97.6
>> crossprod(x, y)[1,1]                 6.77                     2256.2
>> 
>> 
>> Another example is when computing the sums of the columns of a
>> matrix.  For example:
>> 
>> set.seed(1)
>> jjm <- matrix(rnorm(600), 5)
>> 
>> Timings for this under Windows 2000 with R version 2.0.1 (on an
>> old chip running at about 0.7Ghz) for 100,000 computations are:
>> 
>> apply(jjm, 2, sum)               536.59
>> colSums(jjm)                         18.26
>> rep(1,5) %*% jjm                 15.41
>> crossprod(rep(1,5), jjm)        13.16
>> 
>> (These timings seem to be stable across R versions and on at least
>> one Linux platform.)
>> 
>> Andy Liaw showed another example of 'crossprod' being fast a couple
>> days ago on R-help.
>> 
>> Questions for those with a more global picture of the code:
>> 
>> *  Is the speed advantage of 'crossprod' inherent, or is it because
>> more care has been taken with its implementation than the other
>> functions?
>> 
>> *  Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is
>> going to BLAS while 'sum' can't?
>> 
>> *  Would it make sense to (essentially) use 'crossprod' in
>> 'colSums' and its friends at least for the special case of matrices?
>> 
>> Patrick Burns
>> 
>> Burns Statistics
>> patrick at burns-stat.com
>> +44 (0)20 8525 0696
>> http://www.burns-stat.com
>> (home of S Poetry and "A Guide for the Unwilling S User")
>> 
>> ______________________________________________
>> R-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list