[R] How to do aggregate operations with non-scalar functions
Gabor Grothendieck
ggrothendieck at gmail.com
Thu Apr 7 08:33:32 CEST 2005
On Apr 7, 2005 1:18 AM, Itay Furman <itayf at u.washington.edu> wrote:
>
> On Tue, 5 Apr 2005, Gabor Grothendieck wrote:
>
> > On Apr 5, 2005 6:59 PM, Itay Furman <itayf at u.washington.edu> wrote:
> >>
> >> Hi,
> >>
> >> I have a data set, the structure of which is something like this:
> >>
> >>> a <- rep(c("a", "b"), c(6,6))
> >>> x <- rep(c("x", "y", "z"), c(4,4,4))
> >>> df <- data.frame(a=a, x=x, r=rnorm(12))
> >>
> >> The true data set has >1 million rows. The factors "a" and "x"
> >> have about 70 levels each; combined together they subset 'df'
> >> into ~900 data frames.
> >> For each such subset I'd like to compute various statistics
> >> including quantiles, but I can't find an efficient way of
>
> [snip]
>
> >> I would like to end up with a data frame like this:
> >>
> >> a x 0% 25%
> >> 1 a x -0.7727268 0.1693188
> >> 2 a y -0.3410671 0.1566322
> >> 3 b y -0.2914710 -0.2677410
> >> 4 b z -0.8502875 -0.6505710
>
> [snip]
>
> > One can use
> >
> > do.call("rbind", by(df, list(a = a, x = x), f))
> >
> > where f is the appropriate function.
> >
> > In this case f can be described in terms of df.quantile which
> > is like quantile except it returns a one row data frame:
> >
> > df.quantile <- function(x,p)
> > as.data.frame(t(data.matrix(quantile(x, p))))
> >
> > f <- function(df, p = c(0.25, 0.5))
> > cbind(df[1,1:2], df.quantile(df[,"r"], p))
> >
>
> Thanks! Just what I wanted.
>
> A minor point is that for some reason the row numbers in the
> final data frame are not sequential (see below -- this is not a
> consequence of my changes).
These are the original row numbers of the first row of
each combo of a and x. If z is the result of do.call
you can always do this: row.names(z) <- 1:nrow(z)
if this its needed.
More information about the R-help
mailing list