[R] speed up in R apply
David Winsemius
dwinsemius at comcast.net
Wed Jan 5 22:09:55 CET 2011
On Jan 5, 2011, at 2:40 PM, Douglas Bates wrote:
> On Wed, Jan 5, 2011 at 1:22 PM, David Winsemius <dwinsemius at comcast.net
> > wrote:
>>
>> On Jan 5, 2011, at 10:03 AM, Young Cho wrote:
>>
>>> Hi,
>>>
>>> I am doing some simulations and found a bottle neck in my R
>>> script. I made
>>> an example:
>>>
>>>> a = matrix(rnorm(5000000),1000000,5)
>>>> tt = Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time()
>>>> - tt
>>>
>>> [1] -1291.026
>>> Time difference of 0.2354031 secs
>>>>
>>>> tt = Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt
>>>
>>> [1] -1291.026
>>> Time difference of 20.23150 secs
>>>
>>> Is there a faster way of calculating sum of products (of columns,
>>> or of
>>> rows)?
>>
>> You should look at crossprod and tcrossprod.
>
> Hmm. Not sure that would help, David. You could use a matrix
> multiplication of a %*% rep(1, ncol(a)) if you wanted the row sums but
> of course you could also use rowSums to get those.
Thanks for pointing that out. I misread the OP's code.
>
>>> And is this an expected behavior?
>>
>> Yes. For loops and *apply strategies are slower than the proper use
>> of
>> vectorized functions.
>
> To expand a bit on David's point, the apply function isn't magic. It
> essentially loops over the rows, in this case. By multiplying columns
> together you are performing the looping over the rows in compiled
> code, which is much, much faster. If you want to do this kind of
> operation effectively in R for a general matrix (i.e. not knowing in
> advance that it has exactly 5 columns) you could use Reduce
>
>> a <- matrix(rnorm(5000000),1000000,5)
>> system.time(pr1 <- a[,1]*a[,2]*a[,3]*a[,4]*a[,5])
> user system elapsed
> 0.15 0.09 0.37
>> system.time(pr2 <- apply(a, 1, prod))
> user system elapsed
> 22.090 0.140 22.902
>> all.equal(pr1, pr2)
> [1] TRUE
>> system.time(pr3 <- Reduce(get("*"), as.data.frame(a), rep(1,
>> nrow(a))))
Slightly faster would be:
system.time(pr3 <- Reduce("*", as.data.frame(a)))
And thanks for the nice example. Using a data.frame to feed Reduce
materially enhances its value to me.
> user system elapsed
> 0.410 0.010 0.575
>> all.equal(pr3, pr2)
> [1] TRUE
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list