[Rd] aggregate.formula

Wed May 26 20:55:42 CEST 2004

This relates to a message from Christophe Pallier to r-help some time ago.
Like myself, he finds aggregate very useful, but the interface a little
cumbersome. I've implemented a more compact formula interface, found at
the bottom of this message:

 data(ToothGrowth)

 # I used to aggregate like this:
 aggregate(list(len=ToothGrowth$len),
           list(supp=ToothGrowth$supp,dose=ToothGrowth$dose), mean)

 # Recently, I discovered a slightly shorter call:
 with(ToothGrowth, aggregate(list(len=len), list(supp=supp,dose=dose),
      mean))

 # But aggregate.formula allows:
 aggregate(len~supp*dose, data=ToothGrowth, mean)
 # as well as subsetting:
 aggregate(len~supp*dose, data=ToothGrowth, subset=dose<2, mean)

I use * notation, since the means correspond to aov(len~supp*factor(dose),
data=ToothGrowth) but + notation is also supported. The implementation is
probably not top-notch, but I think many R users would appreciate
something like aggregate.formula.

Cheers,
Arni

---

"aggregate.formula" <-
function(formula, data=NULL, FUN=mean, subset=TRUE)
########################################################################
###                                                                    #
### Function: aggregate.formula                                        #
###                                                                    #
### Purpose:  Compute summary statistics from a formula                #
###                                                                    #
### Args:     formula is a formula like y~x                            #
###           data is where formula terms are stored, usually a data   #
###             frame, list, or NULL for workspace                     #
###           FUN is a function to compute the summary statistics      #
###           subset is a logical vector specifying which part of the  #
###             data to summarize, or TRUE to include all data         #
###                                                                    #
### Author:   Arni Magnusson <arnima at u.washington.edu>, inspired by    #
###             an R-help message from Christophe Pallier              #
###                                                                    #
### Returns:  Data frame containing summary statistics                 #
###                                                                    #
########################################################################
{
  x.str  <- as.character(formula[2])
  by.str <- as.character(formula[3])
  by.str <- unlist(strsplit(by.str, " [\\*\\+] "))
  if(is.null(data))
  {
    x  <- as.data.frame(get(x.str,pos=1))
    by <- as.data.frame(lapply(by.str,get,pos=1))
  }
  else if(is.data.frame(data))
  {
    x  <- eval(data)[,x.str,drop=FALSE]
    by <- eval(data)[,by.str,drop=FALSE]
  }
  else  # assume list of some sort
  {
    x  <- as.data.frame(eval(data)[x.str])
    by <- as.data.frame(eval(data)[by.str])
  }
  attach(data)
  output <- aggregate(x[subset,,drop=FALSE], by[subset,,drop=FALSE], FUN)
  detach(data)
  names(output) <- c(by.str, x.str)
  return(output)
}