[R] how to make aggregation in R ?
hadley wickham
h.wickham at gmail.com
Fri Mar 20 05:15:17 CET 2009
On Thu, Mar 19, 2009 at 8:40 PM, jim holtman <jholtman at gmail.com> wrote:
> Try this technique. I use it with large data objects since it is
> sometime faster, and uses less memory, by using indices:
>
> x <- read.table(textConnection(" v1 v2 n1 n2
> 1 a a1 1 21
> 2 a a1 2 22
> 3 a a1 3 23
> 4 a a2 4 24
> 5 a a3 5 25
> 6 b b1 6 26
> 7 b b1 7 27
> 8 b b2 8 28
> 9 b b2 9 29
> 10 b b2 10 30
> 11 c c1 11 31
> 12 c c2 12 32
> 13 c c2 13 33
> 14 c c2 14 34
> 15 c c3 15 35
> 16 d d1 16 36
> 17 d d2 17 37
> 18 d d3 18 38
> 19 d d4 19 39
> 20 d d4 20 40"), header=TRUE)
> closeAllConnections()
> # use indices to reduce memory
> x.ind <- split(seq(nrow(x)), list(x$v1, x$v2), drop=TRUE)
> # now aggregate using the indices
> x.agg <- do.call(rbind, lapply(x.ind, function(.seg){
> data.frame(v1=x$v1[.seg[1]], v2=x$v2[.seg[1]],
> n1=sum(x$n1[.seg]), n2=sum(x$n2[.seg]))
> }))
This is basically the approach that the plyr package,
http://had.co.nz/plyr, uses behind a user-friendly interface.
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list