[Rd] tapply

Sun Sep 4 02:22:25 CEST 2005

compared to by tapply has the nice property that the output is a
multidimensional array. But in its standard form it only accepts one
vector, a list of factors, and a function of one argument.
Then it splits the vector according to the factor(s) and
applies the function to each of subsets created by the split.
Therefore, it can not be used to compute weighted means
with one vector containing the data and another vector containing
the weights.

There are other ways of computing groupwise weighted means,
but they do not return multidimensional arrays, but lists.

Therefore, I rewrote tapply to handle a list of vectors as its first
argument. If the list contains n vectors, the function argument has to
be a function of n arguments. Then, all the vectors are split according
to the factor list, and the function is applied to each of the subsets
defined this way.

Following is the code for mtapply doing this.
It could be used as a replacement for tapply, since it
handles vector arguments just like the current implementation of tapply
does.

Perhaps other list members are interested in testing this function,
and perhaps there is even interest in including this extended version of
tapply in the R distribution.

#######################################

mtapply<-function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
{
    FUN <- if (!is.null(FUN))
        match.fun(FUN)
    if (!is.list(INDEX))
        INDEX <- list(INDEX)
    nI <- length(INDEX)
    namelist <- vector("list", nI)
    names(namelist) <- names(INDEX)
    extent <- integer(nI)
    nx <- ifelse(is.list(X),length(X[[1]]),length(X))
    one <- as.integer(1)
    group <- rep.int(one, nx)
    ngroup <- one
    for (i in seq(INDEX)) {
        index <- as.factor(INDEX[[i]])
        if (length(index) != nx)
            stop("arguments must have same length")
        namelist[[i]] <- levels(index)
        extent[i] <- nlevels(index)
        group <- group + ngroup * (as.integer(index) - one)
        ngroup <- ngroup * nlevels(index)
    }
    if (is.null(FUN))
        return(group)
     if (!is.list(X)) {
    ans <- lapply(split(X, group), FUN, ...)
    index <- as.numeric(names(ans))
     }
     else {
    myargs<-vector("list",length(X)+1)
     for (i in 1:length(X)) myargs[[i+1]]<-split(X[[i]],group)
     myargs[[1]]<-FUN
     ans<-do.call(mapply,myargs)
     ansx <- lapply(myargs[[2]],length)
     index <- as.numeric(names(ansx))
     }
    if (simplify && all(unlist(lapply(ans,length)) == 1)) {
        ansmat <- array(dim = extent, dimnames = namelist)
        if (is.list(ans)) ans <- unlist(ans, recursive = FALSE)
    }
    else {
        ansmat <- array(vector("list", prod(extent)), dim = extent,
            dimnames = namelist)
    }
    names(ans) <- NULL
    ansmat[index] <- ans
    ansmat
}

###########################################

-- 
Erich Neuwirth, Didactic Center for Computer Science
University of Vienna
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39902 Fax: +43-1-4277-9399