[R] speed up in R apply

David Winsemius dwinsemius at comcast.net
Wed Jan 5 22:09:55 CET 2011


On Jan 5, 2011, at 2:40 PM, Douglas Bates wrote:

> On Wed, Jan 5, 2011 at 1:22 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>>
>> On Jan 5, 2011, at 10:03 AM, Young Cho wrote:
>>
>>> Hi,
>>>
>>> I am doing some simulations and found a bottle neck in my R  
>>> script. I made
>>> an example:
>>>
>>>> a = matrix(rnorm(5000000),1000000,5)
>>>> tt  = Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time()  
>>>> - tt
>>>
>>> [1] -1291.026
>>> Time difference of 0.2354031 secs
>>>>
>>>> tt  = Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt
>>>
>>> [1] -1291.026
>>> Time difference of 20.23150 secs
>>>
>>> Is there a faster way of calculating sum of products (of columns,  
>>> or of
>>> rows)?
>>
>> You should look at crossprod and tcrossprod.
>
> Hmm.  Not sure that would help, David.  You could use a matrix
> multiplication of a %*% rep(1, ncol(a)) if you wanted the row sums but
> of course you could also use rowSums to get those.

Thanks for pointing  that out. I misread the OP's code.
>
>>> And is this an expected behavior?
>>
>> Yes. For loops and *apply strategies are slower than the proper use  
>> of
>> vectorized functions.
>
> To expand a bit on David's point, the apply function isn't magic.  It
> essentially loops over the rows, in this case.  By multiplying columns
> together you are performing the looping over the rows in compiled
> code, which is much, much faster.  If you want to do this kind of
> operation effectively in R for a general matrix (i.e. not knowing in
> advance that it has exactly 5 columns) you could use Reduce
>
>> a <- matrix(rnorm(5000000),1000000,5)
>> system.time(pr1 <- a[,1]*a[,2]*a[,3]*a[,4]*a[,5])
>   user  system elapsed
>   0.15    0.09    0.37
>> system.time(pr2 <- apply(a, 1, prod))
>   user  system elapsed
> 22.090   0.140  22.902
>> all.equal(pr1, pr2)
> [1] TRUE
>> system.time(pr3 <- Reduce(get("*"), as.data.frame(a), rep(1,  
>> nrow(a))))

Slightly faster would be:

system.time(pr3 <- Reduce("*", as.data.frame(a)))

And thanks for the nice example. Using a data.frame to feed Reduce  
materially enhances its value to me.

>   user  system elapsed
>  0.410   0.010   0.575
>> all.equal(pr3, pr2)
> [1] TRUE

--
David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list