[R] speed up in R apply

Uwe Ligges ligges at statistik.tu-dortmund.de
Thu Jan 6 10:09:40 CET 2011



On 05.01.2011 22:49, Young Cho wrote:
> When introduced to R, I learned how to use *apply whenever I could to avoid
> for-loops and all. And, getting the habit, I think I somehow got the
> mis-conception that it is a magic source, always an optimal way of coding in
> R.

That is right, but your apply emulates a loop over all rows. And 
vectorized solutions are almost always preferable.

If you try to run the apply() way in the other dimension of the matrix 
you will find that it is as fast the vectorizes solution (since only 5 
iterations are required then).

Uwe Ligges


> Thanks a lot for all of your helpful advice and comment!
>
> Young
>
> On Wed, Jan 5, 2011 at 3:09 PM, David Winsemius<dwinsemius at comcast.net>wrote:
>
>>
>> On Jan 5, 2011, at 2:40 PM, Douglas Bates wrote:
>>
>>   On Wed, Jan 5, 2011 at 1:22 PM, David Winsemius<dwinsemius at comcast.net>
>>> wrote:
>>>
>>>>
>>>> On Jan 5, 2011, at 10:03 AM, Young Cho wrote:
>>>>
>>>>   Hi,
>>>>>
>>>>> I am doing some simulations and found a bottle neck in my R script. I
>>>>> made
>>>>> an example:
>>>>>
>>>>>   a = matrix(rnorm(5000000),1000000,5)
>>>>>> tt  = Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time() - tt
>>>>>>
>>>>>
>>>>> [1] -1291.026
>>>>> Time difference of 0.2354031 secs
>>>>>
>>>>>>
>>>>>> tt  = Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt
>>>>>>
>>>>>
>>>>> [1] -1291.026
>>>>> Time difference of 20.23150 secs
>>>>>
>>>>> Is there a faster way of calculating sum of products (of columns, or of
>>>>> rows)?
>>>>>
>>>>
>>>> You should look at crossprod and tcrossprod.
>>>>
>>>
>>> Hmm.  Not sure that would help, David.  You could use a matrix
>>> multiplication of a %*% rep(1, ncol(a)) if you wanted the row sums but
>>> of course you could also use rowSums to get those.
>>>
>>
>> Thanks for pointing  that out. I misread the OP's code.
>>
>>
>>>   And is this an expected behavior?
>>>>>
>>>>
>>>> Yes. For loops and *apply strategies are slower than the proper use of
>>>> vectorized functions.
>>>>
>>>
>>> To expand a bit on David's point, the apply function isn't magic.  It
>>> essentially loops over the rows, in this case.  By multiplying columns
>>> together you are performing the looping over the rows in compiled
>>> code, which is much, much faster.  If you want to do this kind of
>>> operation effectively in R for a general matrix (i.e. not knowing in
>>> advance that it has exactly 5 columns) you could use Reduce
>>>
>>>   a<- matrix(rnorm(5000000),1000000,5)
>>>> system.time(pr1<- a[,1]*a[,2]*a[,3]*a[,4]*a[,5])
>>>>
>>>   user  system elapsed
>>>   0.15    0.09    0.37
>>>
>>>> system.time(pr2<- apply(a, 1, prod))
>>>>
>>>   user  system elapsed
>>> 22.090   0.140  22.902
>>>
>>>> all.equal(pr1, pr2)
>>>>
>>> [1] TRUE
>>>
>>>> system.time(pr3<- Reduce(get("*"), as.data.frame(a), rep(1, nrow(a))))
>>>>
>>>
>> Slightly faster would be:
>>
>> system.time(pr3<- Reduce("*", as.data.frame(a)))
>>
>> And thanks for the nice example. Using a data.frame to feed Reduce
>> materially enhances its value to me.
>>
>>
>>    user  system elapsed
>>>   0.410   0.010   0.575
>>>
>>>> all.equal(pr3, pr2)
>>>>
>>> [1] TRUE
>>>
>>
>> --
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list