[R] aggregate vs tapply; is there a middle ground?
Hans Gardfjell
hans.gardfjell at emg.umu.se
Mon Feb 13 08:52:08 CET 2006
Thanks Peter!
I had a "feeling" that there must be a simpler, better, more elegant
solution.
/Hans
Peter Dalgaard wrote:
> hadley wickham <h.wickham at gmail.com> writes:
>
>
>>> I faced a similar problem. Here's what I did
>>>
>>> tmp <-
>>> data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
>>> tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum))
>>> tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
>>> merge(tmp2,tmp1,all.x=T)
>>>
>>> At least fewer than 10 extra lines of code. Anyone with a simpler solution?
>>>
>> Well, you can almost do this in with the reshape package:
>>
>> tmp <-
>> data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
>> a <- recast(tmp, A + B ~ ., sum)
>> # see also recast(tmp, A ~ B, sum)
>> add.all.combinations(a, row="A", cols = "B")
>>
>> Where add.all.combinations basically does what you outlined above --
>> it would be easy enough to generalise to multiple dimensions.
>>
>
> Anything wrong with
>
>
>> as.data.frame(with(tmp,as.table(tapply(C,list(A=A,B=B),sum))))
>>
> A B Freq
> 1 A a NA
> 2 B a -0.2524320
> 3 C a 3.8539264
> 4 D a NA
> 5 A c 0.7227294
> 6 B c -0.2694669
> 7 C c 0.4760957
> 8 D c NA
> 9 A e NA
> 10 B e 0.1800500
> 11 C e NA
> 12 D e -1.0350928
>
> (except the silly colname, responseName="sum" should fix that).
>
>
--
*********************************
Hans Gardfjell
Ecology and Environmental Science
Umeå University
90187 Umeå, Sweden
email: hans.gardfjell at emg.umu.se
phone: +46 907865267
mobile: +46 705984464
More information about the R-help
mailing list