[R] aggregate and names of factors

Christophe Pallier pallier at lscp.ehess.fr
Mon Dec 8 13:17:06 CET 2003


Hello,

I use the function 'aggregate' a lot.

One small annoyance is that it is necessary to name the factors in the
'by' list to get the names in the resulting data.frame (else, they
appear as Group.1, Group.2...etc). For example, I am forced to
write:

aggregate(y,list(f1=f1,f2=f2),mean)

instead of aggregate(y,list(f1,f2),mean)

(for two factors with short names, it is not such a big deal, but I
ususally have about 8 factors with long names...)

I wrote a modified 'aggregate.data.frame' function (see the code
below) so that it parses the names of the factors and uses them in the 
output
data.frame. I can now typer aggregate(y,list(f1,f2),mean) ans the 
resulting data.frame
has variables with names 'f1' and 'f2'.

However, I have a few questions:

1. Is is a good idea at all? When expressions rather than variables are
   used as factors, this will probably result in a mess. Can one test
   if an argument within a list, is just a variable name or a more
   complex expression?). Is there a better way?

2. I would also like to keep the name of the data when it is a
   vector, and not a data.frame. The current version transforms it into 'x'.
   I have not managed to modify this behavior, so I am forced to use
    aggregate(data.frame(y),list(f1,f2),mean)

3. I would love to have yet another a version that handles formula so
   that I could type:

   aggregate(y~f1*f2)

   I have a provisory version (see below), but it does not work very
   well.  I would be grateful for any suggestions. In particular, I
   would love to have a 'subset' parameter, as in the lm
   function)

Here is the small piece of code fot the embryo of aggregate.formula:

my.aggregate.formula = function(formula,FUN=mean) {
{
    d=model.frame(formula)

    factor.names=lapply(names(d)[sapply(d,is.factor)],as.name)
    factor.list=lapply(factor.names,eval)
    names(factor.list)=factor.names
    aggregate(d[1],factor.list,FUN)
}



Christophe Pallier
http://www.pallier.org

---------------

HEre is the code for aggregate.data.frame that recovers the name sof the 
factors:

my.aggregate.data.frame <- function (x, by, FUN, ...)
{
 
   if (!is.data.frame(x)) {
        x <- as.data.frame(x)
      }
        
    if (!is.list(by))
        stop("`by' must be a list")

    if (is.null(names(by))) {
      #  names(by) <- paste("Group", seq(along = by), sep = ".")
        names(by)=lapply(substitute(by)[-1],deparse)
    }
    else {
        nam <- names(by)
        ind <- which(nchar(nam) == 0)
        if (any(ind)) {
          names(by)[ind] <- lapply(substitute(by)[c(-1,-(ind))],deparse)
        }
    }
    y <- lapply(x, tapply, by, FUN, ..., simplify = FALSE)
    if (any(sapply(unlist(y, recursive = FALSE), length) > 1))
        stop("`FUN' must always return a scalar")
    z <- y[[1]]
    d <- dim(z)
    w <- NULL
    for (i in seq(along = d)) {
        j <- rep(rep(seq(1:d[i]), prod(d[seq(length = i - 1)]) *
            rep(1, d[i])), prod(d[seq(from = i + 1, length = length(d) -
            i)]))
        w <- cbind(w, dimnames(z)[[i]][j])
    }
    w <- w[which(!unlist(lapply(z, is.null))), ]
    y <- data.frame(w, lapply(y, unlist, use.names = FALSE))
    names(y) <- c(names(by), names(x))
    y
}




More information about the R-help mailing list