[R] aggregate and names of factors
Christophe Pallier
pallier at lscp.ehess.fr
Mon Dec 8 13:17:06 CET 2003
Hello,
I use the function 'aggregate' a lot.
One small annoyance is that it is necessary to name the factors in the
'by' list to get the names in the resulting data.frame (else, they
appear as Group.1, Group.2...etc). For example, I am forced to
write:
aggregate(y,list(f1=f1,f2=f2),mean)
instead of aggregate(y,list(f1,f2),mean)
(for two factors with short names, it is not such a big deal, but I
ususally have about 8 factors with long names...)
I wrote a modified 'aggregate.data.frame' function (see the code
below) so that it parses the names of the factors and uses them in the
output
data.frame. I can now typer aggregate(y,list(f1,f2),mean) ans the
resulting data.frame
has variables with names 'f1' and 'f2'.
However, I have a few questions:
1. Is is a good idea at all? When expressions rather than variables are
used as factors, this will probably result in a mess. Can one test
if an argument within a list, is just a variable name or a more
complex expression?). Is there a better way?
2. I would also like to keep the name of the data when it is a
vector, and not a data.frame. The current version transforms it into 'x'.
I have not managed to modify this behavior, so I am forced to use
aggregate(data.frame(y),list(f1,f2),mean)
3. I would love to have yet another a version that handles formula so
that I could type:
aggregate(y~f1*f2)
I have a provisory version (see below), but it does not work very
well. I would be grateful for any suggestions. In particular, I
would love to have a 'subset' parameter, as in the lm
function)
Here is the small piece of code fot the embryo of aggregate.formula:
my.aggregate.formula = function(formula,FUN=mean) {
{
d=model.frame(formula)
factor.names=lapply(names(d)[sapply(d,is.factor)],as.name)
factor.list=lapply(factor.names,eval)
names(factor.list)=factor.names
aggregate(d[1],factor.list,FUN)
}
Christophe Pallier
http://www.pallier.org
---------------
HEre is the code for aggregate.data.frame that recovers the name sof the
factors:
my.aggregate.data.frame <- function (x, by, FUN, ...)
{
if (!is.data.frame(x)) {
x <- as.data.frame(x)
}
if (!is.list(by))
stop("`by' must be a list")
if (is.null(names(by))) {
# names(by) <- paste("Group", seq(along = by), sep = ".")
names(by)=lapply(substitute(by)[-1],deparse)
}
else {
nam <- names(by)
ind <- which(nchar(nam) == 0)
if (any(ind)) {
names(by)[ind] <- lapply(substitute(by)[c(-1,-(ind))],deparse)
}
}
y <- lapply(x, tapply, by, FUN, ..., simplify = FALSE)
if (any(sapply(unlist(y, recursive = FALSE), length) > 1))
stop("`FUN' must always return a scalar")
z <- y[[1]]
d <- dim(z)
w <- NULL
for (i in seq(along = d)) {
j <- rep(rep(seq(1:d[i]), prod(d[seq(length = i - 1)]) *
rep(1, d[i])), prod(d[seq(from = i + 1, length = length(d) -
i)]))
w <- cbind(w, dimnames(z)[[i]][j])
}
w <- w[which(!unlist(lapply(z, is.null))), ]
y <- data.frame(w, lapply(y, unlist, use.names = FALSE))
names(y) <- c(names(by), names(x))
y
}
More information about the R-help
mailing list