[R] aggregate vs tapply; is there a middle ground?

Sat Feb 11 22:28:21 CET 2006

Dear all,

I'm wanting to do a series of comparisons among 4 categorical variables:

a <- aggregate(y, list(var1, var2, var3, var4), sum)

This gets me a very nice 2-dimensional data frame with one column per 
variable, BUT, as help for aggregate says, <<empty subsets are 
removed>>.  I don't see in help(aggregate) how I can change this.

In contrast,
a <- tapply(y, list(var1, var2, var3, var4), sum)

gives me results for everything including empty subsets, but in an 
awkward 4-dimensional array that takes me another 10 lines of 
inefficient code to turn into a 2D data.frame.

Is there a way to directly do this calculation INCLUDING results for 
empty subsets, and still obtain a 2D array, matrix, or data.frame?  OR 
alternatively is there a simple way to mush the 4D result from the 
tapply into a 2D matrix/data.frame?

thanks very much in advance for any help!

-jlb

-- 
************************************
Joseph P. LeBouton
Forest Ecology PhD Candidate
Department of Forestry
Michigan State University
East Lansing, Michigan 48824

Office phone: 517-355-7744
email: lebouton at msu.edu