[R] aggregate vs tapply; is there a middle ground?

Sat Feb 11 23:44:53 CET 2006

> I faced a similar problem. Here's what I did
>
> tmp <-
> data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
> tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum))
> tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
> merge(tmp2,tmp1,all.x=T)
>
> At least fewer than 10 extra lines of code. Anyone with a simpler solution?

Well, you can almost do this in with the reshape package:

tmp <-
data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
a <- recast(tmp, A + B ~ ., sum)
# see also recast(tmp, A  ~ B, sum)
add.all.combinations(a, row="A", cols = "B")

Where add.all.combinations basically does what you outlined above --
it would be easy enough to generalise to multiple dimensions.

Hadley