[Rd] Using 'dimname names' in aperm() and apply()

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jul 29 23:35:58 CEST 2010


It's not that simple. These are base functions so

- adding utility functions to base is undesirable
- efficiency matters
- any change (including adding a function!) needs corresponding
   documentation.
- this needs much better error checking.
- dimnum.from.dimnamename is rather inefficient: a simple call to
   match() will do the job.  (And we do have seq_along !)

I've been tidying up aperm(), and there this is as simple to do at C 
level (and more efficient).  For apply(), something like

     if (is.character(MARGIN)) {
         if(is.null(dnn <- names(dn)))
            stop("'X' must have named dimnames")
         MARGIN <- match(MARGIN, dnn)
         if (any(is.na(MARGIN)))
             stop("not all elements of 'perm' are names of dimensions")
     }


Thanks for the suggestions: a version will appear in R-devel in due 
course.

On Thu, 29 Jul 2010, Michael Lachmann wrote:

> I think that the "dimname names" of tables and arrays could make
> aperm() and apply() (and probably some other functions) easier to use.
> (dimname names are, for example, created by table() )
>
> The use would be something like:
> --
> x <-table( from=sample(3,100,rep=T), to=sample(5,100,rep=T))
> trans <- x / apply(x,"from",sum)

Here for efficiency use rowSums() ....

> y <- aperm( trans, c("from","to") )
> z <- aperm(y, c("to","from") )
>
> res <-apply( y, "to", sum)
> --
>
> This makes the array much easier to handle than having to keep track
> which dimension currently means what.
>
> For aperm and apply, the change seems very simple - one new function,
> and an additional line in each.
> ----------
> dimnum.from.dimnamename <- function(A, dimensions)
> {
>
>  if( is.character(dimensions) ) {
>    n <- names(dimnames(A))
>    if( !is.null(n) ) {
>        dimnum <- seq( along=n)
>        names(dimnum) <-  n
>        dimensions <- dimnum[dimensions]
>      }
>  }
>  dimensions
> }
>
>
>
> aperm <- function (a, perm, resize = TRUE)
> {
>    if (missing(perm))
>        perm <- integer(0L)
>    perm <- dimnum.from.dimnamename( a, perm) # this line was added to aperm
>    .Internal(aperm(a, perm, resize))
> }
>
> apply <-  function (X, MARGIN, FUN, ...)
> {
>    FUN <- match.fun(FUN)
>    d <- dim(X)
>    dl <- length(d)
>    if (dl == 0L)
>        stop("dim(X) must have a positive length")
>    ds <- 1L:dl
>    if (length(oldClass(X)))
>        X <- if (dl == 2)
>            as.matrix(X)
>        else as.array(X)
>    d <- dim(X)
>    dn <- dimnames(X)
>
>
>    MARGIN <- dimnum.from.dimnamename( X,MARGIN ) # this line was added to apply
>
>    s.call <- ds[-MARGIN]
>    s.ans <- ds[MARGIN]
>    d.call <- d[-MARGIN]
>    d.ans <- d[MARGIN]
>    dn.call <- dn[-MARGIN]
>    dn.ans <- dn[MARGIN]
>    d2 <- prod(d.ans)
>    if (d2 == 0L) {
>        newX <- array(vector(typeof(X), 1L), dim = c(prod(d.call),
>            1L))
>        ans <- FUN(if (length(d.call) < 2L)
>            newX[, 1]
>        else array(newX[, 1L], d.call, dn.call), ...)
>        return(if (is.null(ans)) ans else if (length(d.ans) <
>            2L) ans[1L][-1L] else array(ans, d.ans, dn.ans))
>    }
>    newX <- aperm(X, c(s.call, s.ans))
>    dim(newX) <- c(prod(d.call), d2)
>    ans <- vector("list", d2)
>    if (length(d.call) < 2L) {
>        if (length(dn.call))
>            dimnames(newX) <- c(dn.call, list(NULL))
>        for (i in 1L:d2) {
>            tmp <- FUN(newX[, i], ...)
>            if (!is.null(tmp))
>                ans[[i]] <- tmp
>        }
>    }
>    else for (i in 1L:d2) {
>        tmp <- FUN(array(newX[, i], d.call, dn.call), ...)
>        if (!is.null(tmp))
>            ans[[i]] <- tmp
>    }
>    ans.list <- is.recursive(ans[[1L]])
>    l.ans <- length(ans[[1L]])
>    ans.names <- names(ans[[1L]])
>    if (!ans.list)
>        ans.list <- any(unlist(lapply(ans, length)) != l.ans)
>    if (!ans.list && length(ans.names)) {
>        all.same <- sapply(ans, function(x) identical(names(x),
>            ans.names))
>        if (!all(all.same))
>            ans.names <- NULL
>    }
>    len.a <- if (ans.list)
>        d2
>    else length(ans <- unlist(ans, recursive = FALSE))
>    if (length(MARGIN) == 1L && len.a == d2) {
>        names(ans) <- if (length(dn.ans[[1L]]))
>            dn.ans[[1L]]
>        return(ans)
>    }
>    if (len.a == d2)
>        return(array(ans, d.ans, dn.ans))
>    if (len.a && len.a%%d2 == 0L) {
>        if (is.null(dn.ans))
>            dn.ans <- vector(mode = "list", length(d.ans))
>        dn.ans <- c(list(ans.names), dn.ans)
>        return(array(ans, c(len.a%/%d2, d.ans), if (!all(sapply(dn.ans,
>            is.null))) dn.ans))
>    }
>    return(ans)
> }
> ----------
>
> Thanks,
>
> Michael
>
>
> --
> Michael Lachmann, Max Planck institute of evolutionary anthropology
> Deutscher Platz. 6, 04103 Leipzig, Germany
> Tel: +49-341-3550521, Fax: +49-341-3550555
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list