[R] aggregate vs tapply; is there a middle ground?

Hans Gardfjell hans.gardfjell at emg.umu.se
Sat Feb 11 23:24:48 CET 2006


I faced a similar problem. Here's what I did

tmp <- 
data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum))
tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
merge(tmp2,tmp1,all.x=T)

At least fewer than 10 extra lines of code. Anyone with a simpler solution?

Cheers, Hans


lebouton wrote:
>
>Dear all,
>
>I'm wanting to do a series of comparisons among 4 categorical variables:
>
>a <- aggregate(y, list(var1, var2, var3, var4), sum)
>
>This gets me a very nice 2-dimensional data frame with one column per 
>variable, BUT, as help for aggregate says, <<empty subsets are 
>removed>>.  I don't see in help(aggregate) how I can change this.
>
>In contrast,
>a <- tapply(y, list(var1, var2, var3, var4), sum)
>
>gives me results for everything including empty subsets, but in an 
>awkward 4-dimensional array that takes me another 10 lines of 
>inefficient code to turn into a 2D data.frame.
>
>Is there a way to directly do this calculation INCLUDING results for 
>empty subsets, and still obtain a 2D array, matrix, or data.frame?  OR 
>alternatively is there a simple way to mush the 4D result from the 
>tapply into a 2D matrix/data.frame?
>
>thanks very much in advance for any help!
>
>-jlb
>
>-- 
>************************************
>Joseph P. LeBouton
>Forest Ecology PhD Candidate
>Department of Forestry
>Michigan State University
>East Lansing, Michigan 48824
>
>Office phone: 517-355-7744
>email: lebouton at msu.edu <https://stat.ethz.ch/mailman/listinfo/r-help>


-- 

*********************************
Hans Gardfjell
Ecology and Environmental Science
Umeå University
90187 Umeå, Sweden
email: hans.gardfjell at emg.umu.se
phone:  +46 907865267
mobile: +46 705984464




More information about the R-help mailing list