[R] aggregate vs tapply; is there a middle ground?
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sun Feb 12 00:37:46 CET 2006
hadley wickham <h.wickham at gmail.com> writes:
> > I faced a similar problem. Here's what I did
> >
> > tmp <-
> > data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
> > tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum))
> > tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
> > merge(tmp2,tmp1,all.x=T)
> >
> > At least fewer than 10 extra lines of code. Anyone with a simpler solution?
>
> Well, you can almost do this in with the reshape package:
>
> tmp <-
> data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
> a <- recast(tmp, A + B ~ ., sum)
> # see also recast(tmp, A ~ B, sum)
> add.all.combinations(a, row="A", cols = "B")
>
> Where add.all.combinations basically does what you outlined above --
> it would be easy enough to generalise to multiple dimensions.
Anything wrong with
> as.data.frame(with(tmp,as.table(tapply(C,list(A=A,B=B),sum))))
A B Freq
1 A a NA
2 B a -0.2524320
3 C a 3.8539264
4 D a NA
5 A c 0.7227294
6 B c -0.2694669
7 C c 0.4760957
8 D c NA
9 A e NA
10 B e 0.1800500
11 C e NA
12 D e -1.0350928
(except the silly colname, responseName="sum" should fix that).
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list