[Rd] aggregate.formula
Arni Magnusson
arnima at u.washington.edu
Wed May 26 20:55:42 CEST 2004
This relates to a message from Christophe Pallier to r-help some time ago.
Like myself, he finds aggregate very useful, but the interface a little
cumbersome. I've implemented a more compact formula interface, found at
the bottom of this message:
data(ToothGrowth)
# I used to aggregate like this:
aggregate(list(len=ToothGrowth$len),
list(supp=ToothGrowth$supp,dose=ToothGrowth$dose), mean)
# Recently, I discovered a slightly shorter call:
with(ToothGrowth, aggregate(list(len=len), list(supp=supp,dose=dose),
mean))
# But aggregate.formula allows:
aggregate(len~supp*dose, data=ToothGrowth, mean)
# as well as subsetting:
aggregate(len~supp*dose, data=ToothGrowth, subset=dose<2, mean)
I use * notation, since the means correspond to aov(len~supp*factor(dose),
data=ToothGrowth) but + notation is also supported. The implementation is
probably not top-notch, but I think many R users would appreciate
something like aggregate.formula.
Cheers,
Arni
---
"aggregate.formula" <-
function(formula, data=NULL, FUN=mean, subset=TRUE)
########################################################################
### #
### Function: aggregate.formula #
### #
### Purpose: Compute summary statistics from a formula #
### #
### Args: formula is a formula like y~x #
### data is where formula terms are stored, usually a data #
### frame, list, or NULL for workspace #
### FUN is a function to compute the summary statistics #
### subset is a logical vector specifying which part of the #
### data to summarize, or TRUE to include all data #
### #
### Author: Arni Magnusson <arnima at u.washington.edu>, inspired by #
### an R-help message from Christophe Pallier #
### #
### Returns: Data frame containing summary statistics #
### #
########################################################################
{
x.str <- as.character(formula[2])
by.str <- as.character(formula[3])
by.str <- unlist(strsplit(by.str, " [\\*\\+] "))
if(is.null(data))
{
x <- as.data.frame(get(x.str,pos=1))
by <- as.data.frame(lapply(by.str,get,pos=1))
}
else if(is.data.frame(data))
{
x <- eval(data)[,x.str,drop=FALSE]
by <- eval(data)[,by.str,drop=FALSE]
}
else # assume list of some sort
{
x <- as.data.frame(eval(data)[x.str])
by <- as.data.frame(eval(data)[by.str])
}
attach(data)
output <- aggregate(x[subset,,drop=FALSE], by[subset,,drop=FALSE], FUN)
detach(data)
names(output) <- c(by.str, x.str)
return(output)
}
More information about the R-devel
mailing list