[R] How to do aggregate operations with non-scalar functions

Gabor Grothendieck ggrothendieck at gmail.com
Wed Apr 6 04:15:19 CEST 2005


On Apr 5, 2005 6:59 PM, Itay Furman <itayf at u.washington.edu> wrote:
> 
> Hi,
> 
> I have a data set, the structure of which is something like this:
> 
> > a <- rep(c("a", "b"), c(6,6))
> > x <- rep(c("x", "y", "z"), c(4,4,4))
> > df <- data.frame(a=a, x=x, r=rnorm(12))
> 
> The true data set has >1 million rows. The factors "a" and "x"
> have about 70 levels each; combined together they subset 'df'
> into ~900 data frames.
> For each such subset I'd like to compute various statistics
> including quantiles, but I can't find an efficient way of
> doing this.  Aggregate() gives me the desired structure -
> namely, one row per subset - but I can use it only to compute
> a single quantile.
> 
> > aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25)
>   a x          x
> 1 a x  0.1693188
> 2 a y  0.1566322
> 3 b y -0.2677410
> 4 b z -0.6505710
> 
> With by() I could compute several quantiles per subset at
> each shot, but the structure of the output is not
> convenient for further analysis and visualization.
> 
> > by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25))
> a: a
> x: x
>         0%        25%
> -0.7727268  0.1693188
> ----------------------------------------------------------
> a: b
> x: x
> NULL
> ----------------------------------------------------------
> 
> [snip]
> 
> I would like to end up with a data frame like this:
> 
>   a x         0%        25%
> 1 a x -0.7727268  0.1693188
> 2 a y -0.3410671  0.1566322
> 3 b y -0.2914710 -0.2677410
> 4 b z -0.8502875 -0.6505710
> 
> I checked sweep() and apply() and didn't see how to harness
> them for that purpose.
> 
> So, is there a simple way to convert the object returned
> by by() into a data.frame?
> Or, is there a better way to go with this?
> Finally, if I should roll my own coercion function: any tips?
> 


One can use 

	do.call("rbind", by(df, list(a = a, x = x), f))

where f is the appropriate function. 

In this case f can be described in terms of df.quantile which 
is like quantile except it returns a one row data frame:

	df.quantile <- function(x,p) 
		as.data.frame(t(data.matrix(quantile(x, p))))

	f <- function(df, p = c(0.25, 0.5))
		cbind(df[1,1:2], df.quantile(df[,"r"], p))




More information about the R-help mailing list