[R] Performance enhancement for ave
Matthew Dowle
mdowle at mdowle.plus.com
Tue Jun 29 15:02:16 CEST 2010
> dt = data.table(d,key="grp1,grp2")
> system.time(ans1 <- dt[ , list(mean(x),mean(y)) , by=list(grp1,grp2)])
user system elapsed
3.89 0.00 3.91 # your 7.064 is 12.23 for me though, so this
3.9 should be faster for you
However, Rprof() shows that 3.9 is mostly dispatch of mean to mean.default
which then calls .Internal. Because there are so many groups here, dispatch
bites.
So ...
> system.time(ans2 <- dt[ , list(.Internal(mean(x)),.Internal(mean(y))),
> by=list(grp1,grp2)])
user system elapsed
0.20 0.00 0.21
> identical(ans1,ans2)
TRUE
"Hadley Wickham" <hadley at rice.edu> wrote in message
news:AANLkTilH_-3_CycF_fNQMhH6W2oG5Jj5U0YopX_qAgRU at mail.gmail.com...
> library(plyr)
>
> n<-100000
> grp1<-sample(1:750, n, replace=T)
> grp2<-sample(1:750, n, replace=T)
> d<-data.frame(x=rnorm(n), y=rnorm(n), grp1=grp1, grp2=grp2)
>
> system.time({
> d$avx1 <- ave(d$x, list(d$grp1, d$grp2))
> d$avy1 <- ave(d$y, list(d$grp1, d$grp2))
> })
> # user system elapsed
> # 39.300 0.279 40.809
> system.time({
> d$avx2 <- ave(d$x, interaction(d$grp1, d$grp2, drop = T))
> d$avy2 <- ave(d$y, interaction(d$grp1, d$grp2, drop = T))
> })
> # user system elapsed
> # 6.735 0.209 7.064
>
> all.equal(d$avy1, d$avy2)
> # TRUE
> all.equal(d$avx1, d$avx2)
> # TRUE
>
> i.e. ave should use g <- interaction(..., drop = TRUE)
>
> Hadley
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>
More information about the R-help
mailing list