[R] speed up in R apply
Liviu Andronic
landronimirc at gmail.com
Thu Jan 6 09:53:44 CET 2011
On Wed, Jan 5, 2011 at 10:49 PM, Young Cho <young.stat at gmail.com> wrote:
> When introduced to R, I learned how to use *apply whenever I could to avoid
> for-loops and all. And, getting the habit, I think I somehow got the
> mis-conception that it is a magic source, always an optimal way of coding in
> R.
>
See [1] for an article on vectorisation and loops in R.
Liviu
[1] http://www.r-project.org/doc/Rnews/Rnews_2008-1.pdf
> Thanks a lot for all of your helpful advice and comment!
>
> Young
>
> On Wed, Jan 5, 2011 at 3:09 PM, David Winsemius <dwinsemius at comcast.net>wrote:
>
>>
>> On Jan 5, 2011, at 2:40 PM, Douglas Bates wrote:
>>
>> On Wed, Jan 5, 2011 at 1:22 PM, David Winsemius <dwinsemius at comcast.net>
>>> wrote:
>>>
>>>>
>>>> On Jan 5, 2011, at 10:03 AM, Young Cho wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I am doing some simulations and found a bottle neck in my R script. I
>>>>> made
>>>>> an example:
>>>>>
>>>>> a = matrix(rnorm(5000000),1000000,5)
>>>>>> tt = Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time() - tt
>>>>>>
>>>>>
>>>>> [1] -1291.026
>>>>> Time difference of 0.2354031 secs
>>>>>
>>>>>>
>>>>>> tt = Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt
>>>>>>
>>>>>
>>>>> [1] -1291.026
>>>>> Time difference of 20.23150 secs
>>>>>
>>>>> Is there a faster way of calculating sum of products (of columns, or of
>>>>> rows)?
>>>>>
>>>>
>>>> You should look at crossprod and tcrossprod.
>>>>
>>>
>>> Hmm. Not sure that would help, David. You could use a matrix
>>> multiplication of a %*% rep(1, ncol(a)) if you wanted the row sums but
>>> of course you could also use rowSums to get those.
>>>
>>
>> Thanks for pointing that out. I misread the OP's code.
>>
>>
>>> And is this an expected behavior?
>>>>>
>>>>
>>>> Yes. For loops and *apply strategies are slower than the proper use of
>>>> vectorized functions.
>>>>
>>>
>>> To expand a bit on David's point, the apply function isn't magic. It
>>> essentially loops over the rows, in this case. By multiplying columns
>>> together you are performing the looping over the rows in compiled
>>> code, which is much, much faster. If you want to do this kind of
>>> operation effectively in R for a general matrix (i.e. not knowing in
>>> advance that it has exactly 5 columns) you could use Reduce
>>>
>>> a <- matrix(rnorm(5000000),1000000,5)
>>>> system.time(pr1 <- a[,1]*a[,2]*a[,3]*a[,4]*a[,5])
>>>>
>>> user system elapsed
>>> 0.15 0.09 0.37
>>>
>>>> system.time(pr2 <- apply(a, 1, prod))
>>>>
>>> user system elapsed
>>> 22.090 0.140 22.902
>>>
>>>> all.equal(pr1, pr2)
>>>>
>>> [1] TRUE
>>>
>>>> system.time(pr3 <- Reduce(get("*"), as.data.frame(a), rep(1, nrow(a))))
>>>>
>>>
>> Slightly faster would be:
>>
>> system.time(pr3 <- Reduce("*", as.data.frame(a)))
>>
>> And thanks for the nice example. Using a data.frame to feed Reduce
>> materially enhances its value to me.
>>
>>
>> user system elapsed
>>> 0.410 0.010 0.575
>>>
>>>> all.equal(pr3, pr2)
>>>>
>>> [1] TRUE
>>>
>>
>> --
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
More information about the R-help
mailing list