[R] How to do aggregate operations with non-scalar functions

Thu Apr 7 08:33:32 CEST 2005

On Apr 7, 2005 1:18 AM, Itay Furman <itayf at u.washington.edu> wrote:
> 
> On Tue, 5 Apr 2005, Gabor Grothendieck wrote:
> 
> > On Apr 5, 2005 6:59 PM, Itay Furman <itayf at u.washington.edu> wrote:
> >>
> >> Hi,
> >>
> >> I have a data set, the structure of which is something like this:
> >>
> >>> a <- rep(c("a", "b"), c(6,6))
> >>> x <- rep(c("x", "y", "z"), c(4,4,4))
> >>> df <- data.frame(a=a, x=x, r=rnorm(12))
> >>
> >> The true data set has >1 million rows. The factors "a" and "x"
> >> have about 70 levels each; combined together they subset 'df'
> >> into ~900 data frames.
> >> For each such subset I'd like to compute various statistics
> >> including quantiles, but I can't find an efficient way of
> 
> [snip]
> 
> >> I would like to end up with a data frame like this:
> >>
> >>   a x         0%        25%
> >> 1 a x -0.7727268  0.1693188
> >> 2 a y -0.3410671  0.1566322
> >> 3 b y -0.2914710 -0.2677410
> >> 4 b z -0.8502875 -0.6505710
> 
> [snip]
> 
> > One can use
> >
> >       do.call("rbind", by(df, list(a = a, x = x), f))
> >
> > where f is the appropriate function.
> >
> > In this case f can be described in terms of df.quantile which
> > is like quantile except it returns a one row data frame:
> >
> >       df.quantile <- function(x,p)
> >               as.data.frame(t(data.matrix(quantile(x, p))))
> >
> >       f <- function(df, p = c(0.25, 0.5))
> >               cbind(df[1,1:2], df.quantile(df[,"r"], p))
> >
> 
> Thanks!  Just what I wanted.
> 
> A minor point is that for some reason the row numbers in the
> final data frame are not sequential (see below -- this is not a
> consequence of my changes).

These are the original row numbers of the first row of
each combo of a and x.  If z is the result of do.call
you can always do this:   row.names(z) <- 1:nrow(z)
if this its needed.