[R] aggregate function - na.action
Hadley Wickham
hadley at rice.edu
Mon Feb 7 01:41:38 CET 2011
> There's definitely something amiss with aggregate() here since similar
> functions from other packages can reproduce your 'control' sum. I expect
> ddply() will have some timing issues because of all the subgrouping in your
> data frame, but data.table did very well and the summaryBy() function in the
> doBy package did OK:
Well, if you use the right plyr function, it works just fine:
system.time(count(dat, c("x1", "x2", "x3", "x4", "x4", "x5", "x6",
"x7", "x8"), "y"))
# user system elapsed
# 9.754 1.314 11.073
Which illustrates something that I've believed for a while about
data.table - it's not the indexing that speed things up, it's the
custom data structure. If you use ddply with data frames, it's slow
because data frames are slow. I think the right way to resolve this
is to to make data frames more efficient, perhaps using some kind of
mutable interface where necessary for high-performance operations.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
More information about the R-help
mailing list