[R] How to do aggregate operations with non-scalar functions
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Apr 6 04:15:19 CEST 2005
On Apr 5, 2005 6:59 PM, Itay Furman <itayf at u.washington.edu> wrote:
>
> Hi,
>
> I have a data set, the structure of which is something like this:
>
> > a <- rep(c("a", "b"), c(6,6))
> > x <- rep(c("x", "y", "z"), c(4,4,4))
> > df <- data.frame(a=a, x=x, r=rnorm(12))
>
> The true data set has >1 million rows. The factors "a" and "x"
> have about 70 levels each; combined together they subset 'df'
> into ~900 data frames.
> For each such subset I'd like to compute various statistics
> including quantiles, but I can't find an efficient way of
> doing this. Aggregate() gives me the desired structure -
> namely, one row per subset - but I can use it only to compute
> a single quantile.
>
> > aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25)
> a x x
> 1 a x 0.1693188
> 2 a y 0.1566322
> 3 b y -0.2677410
> 4 b z -0.6505710
>
> With by() I could compute several quantiles per subset at
> each shot, but the structure of the output is not
> convenient for further analysis and visualization.
>
> > by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25))
> a: a
> x: x
> 0% 25%
> -0.7727268 0.1693188
> ----------------------------------------------------------
> a: b
> x: x
> NULL
> ----------------------------------------------------------
>
> [snip]
>
> I would like to end up with a data frame like this:
>
> a x 0% 25%
> 1 a x -0.7727268 0.1693188
> 2 a y -0.3410671 0.1566322
> 3 b y -0.2914710 -0.2677410
> 4 b z -0.8502875 -0.6505710
>
> I checked sweep() and apply() and didn't see how to harness
> them for that purpose.
>
> So, is there a simple way to convert the object returned
> by by() into a data.frame?
> Or, is there a better way to go with this?
> Finally, if I should roll my own coercion function: any tips?
>
One can use
do.call("rbind", by(df, list(a = a, x = x), f))
where f is the appropriate function.
In this case f can be described in terms of df.quantile which
is like quantile except it returns a one row data frame:
df.quantile <- function(x,p)
as.data.frame(t(data.matrix(quantile(x, p))))
f <- function(df, p = c(0.25, 0.5))
cbind(df[1,1:2], df.quantile(df[,"r"], p))
More information about the R-help
mailing list