[R] How to do aggregate operations with non-scalar functions
Itay Furman
itayf at u.washington.edu
Wed Apr 6 00:59:01 CEST 2005
Hi,
I have a data set, the structure of which is something like this:
> a <- rep(c("a", "b"), c(6,6))
> x <- rep(c("x", "y", "z"), c(4,4,4))
> df <- data.frame(a=a, x=x, r=rnorm(12))
The true data set has >1 million rows. The factors "a" and "x"
have about 70 levels each; combined together they subset 'df'
into ~900 data frames.
For each such subset I'd like to compute various statistics
including quantiles, but I can't find an efficient way of
doing this. Aggregate() gives me the desired structure -
namely, one row per subset - but I can use it only to compute
a single quantile.
> aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25)
a x x
1 a x 0.1693188
2 a y 0.1566322
3 b y -0.2677410
4 b z -0.6505710
With by() I could compute several quantiles per subset at
each shot, but the structure of the output is not
convenient for further analysis and visualization.
> by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25))
a: a
x: x
0% 25%
-0.7727268 0.1693188
----------------------------------------------------------
a: b
x: x
NULL
----------------------------------------------------------
[snip]
I would like to end up with a data frame like this:
a x 0% 25%
1 a x -0.7727268 0.1693188
2 a y -0.3410671 0.1566322
3 b y -0.2914710 -0.2677410
4 b z -0.8502875 -0.6505710
I checked sweep() and apply() and didn't see how to harness
them for that purpose.
So, is there a simple way to convert the object returned
by by() into a data.frame?
Or, is there a better way to go with this?
Finally, if I should roll my own coercion function: any tips?
Thank you very much in advance,
Itay
----------------------------------------------------------------
itayf at u.washington.edu / +1 (206) 543 9040 / U of Washington
More information about the R-help
mailing list