[R] Performance enhancement for ave
Hadley Wickham
hadley at rice.edu
Tue Jun 29 15:11:25 CEST 2010
On Tue, Jun 29, 2010 at 8:02 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
>> dt = data.table(d,key="grp1,grp2")
>> system.time(ans1 <- dt[ , list(mean(x),mean(y)) , by=list(grp1,grp2)])
> user system elapsed
> 3.89 0.00 3.91 # your 7.064 is 12.23 for me though, so this
> 3.9 should be faster for you
>
> However, Rprof() shows that 3.9 is mostly dispatch of mean to mean.default
> which then calls .Internal. Because there are so many groups here, dispatch
> bites.
>
> So ...
>
>> system.time(ans2 <- dt[ , list(.Internal(mean(x)),.Internal(mean(y))),
>> by=list(grp1,grp2)])
> user system elapsed
> 0.20 0.00 0.21
Of course, we can perform the same optimisation with ave:
fast_mean <- function(x) .Internal(mean(x))
system.time({
d$avx <- ave(d$x, interaction(d$grp1, d$grp2, drop = T), FUN = fast_mean)
d$avy <- ave(d$y, interaction(d$grp1, d$grp2, drop = T), FUN = fast_mean)
})
# user system elapsed
# 3.109 0.188 3.302
Regardless, my point is that there's a simple fix available to make
ave much faster, not that it's the fastest thing out there.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
More information about the R-help
mailing list