[Rd] RFC: sapply() limitation from vector to matrix, but not further

Mon Jan 3 16:21:41 CET 2011

>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Tue, 28 Dec 2010 20:06:07 +0100 writes:

    > On Tue, Dec 28, 2010 at 19:14, Tony Plate <tplate at acm.org>
    > wrote:
    >> The abind() function from the abind package is an
    >> alternative here -- it can take a list argument, which
    >> makes it easy to use with the result of lapply().  It's
    >> also able take direction about which dimension to join
    >> on.
    >> 
    >>> x <- list(a=1,b=2,c=3) f <- function(v) matrix(v,
    >>> nrow=2, ncol=4) sapply(x, f)
    >>     a b c [1,] 1 2 3 [2,] 1 2 3 [3,] 1 2 3 [4,] 1 2 3
    >> [5,] 1 2 3 [6,] 1 2 3 [7,] 1 2 3 [8,] 1 2 3
    >>> 
    >>> # The 'along=' argument to abind() determines on which
    >>> dimension # the list elements are joined.  Use a
    >>> fractional value to put the new # dimension between
    >>> existing ones.
    >>> 
    >>> dim(abind(lapply(x, f), along=0))
    >> [1] 3 2 4
    >>> dim(abind(lapply(x, f), along=1.5))
    >> [1] 2 3 4
    >>> dim(abind(lapply(x, f), along=3))
    >> [1] 2 4 3
    >>> abind(lapply(x, f), along=3)
    >> , , a
    >> 
    >>     [,1] [,2] [,3] [,4] [1,]    1    1    1    1 [2,]  
    >>  1    1    1    1
    >> 
    >> , , b
    >> 
    >>     [,1] [,2] [,3] [,4] [1,]    2    2    2    2 [2,]  
    >>  2    2    2    2
    >> 
    >> , , c
    >> 
    >>     [,1] [,2] [,3] [,4] [1,]    3    3    3    3 [2,]  
    >>  3    3    3    3
    >> 

    > Thank you, Tony.
    > Indeed, yes, abind() is nice here (and in the good ol' APL
    > spirit !)

    > Wanting to keep things both simple *and* fast here, of
    > course, hence I currently contemplate the following code,
    > where the new simplify2array() is considerably simpler
    > than abind():

>     ##' "Simplify" a list of commonly structured components into an array.
>     ##'
>     ##' @title simplify list() to an array if the list elements are structurally equal
>     ##' @param x a list, typically resulting from lapply()
>     ##' @param higher logical indicating if an array() of "higher rank"
>     ##'  should be returned when appropriate, namely when all elements of
>     ##' \code{x} have the same \code{\link{dim}()}ension.
>     ##' @return x itself, or an array if the simplification "is sensible"
>     simplify2array <- function(x, higher = TRUE)
>     {
> 	if(length(common.len <- unique(unlist(lapply(x, length)))) > 1L)
> 	    return(x)
> 	if(common.len == 1L)
> 	    unlist(x, recursive = FALSE)
> 	else if(common.len > 1L) {
> 	    n <- length(x)
> 	    ## make sure that array(*) will not call rep() {e.g. for 'call's}:
> 	    r <- as.vector(unlist(x, recursive = FALSE))
> 	    if(higher && length(c.dim <- unique(lapply(x, dim))) == 1 &&
> 	       is.numeric(c.dim <- c.dim[[1L]]) &&
> 	       prod(d <- c(c.dim, n)) == length(r)) {

> 		iN1 <- is.null(n1 <- dimnames(x[[1L]]))
> 		n2 <- names(x)
> 		dnam <-
> 		    if(!(iN1 && is.null(n2)))
> 			c(if(iN1) rep.int(list(n1), length(c.dim)) else n1,
> 			  list(n2)) ## else NULL
> 		array(r, dim = d, dimnames = dnam)

> 	    } else if(prod(d <- c(common.len, n)) == length(r))
> 		array(r, dim = d,
> 		      dimnames= if(!(is.null(n1 <- names(x[[1L]])) &
> 		      is.null(n2 <- names(x)))) list(n1,n2))
> 	    else x
> 	}
> 	else x
>     }

>     sapply <- function(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
>     {
> 	FUN <- match.fun(FUN)
> 	answer <- lapply(X, FUN, ...)
> 	if(USE.NAMES && is.character(X) && is.null(names(answer)))
> 	    names(answer) <- X
> 	if(!identical(simplify, FALSE) && length(answer))
> 	    simplify2array(answer, higher = (simplify == "array"))
> 	else answer
>     }

As some may have noted, the above has been committed to R-devel
   (r53886 | maechler | 2010-12-29 10:36:01 +0100)

with the extra
------------------------------------------------------
NOTE: vapply() and replicate() doing that *by default*
------------------------------------------------------
which means that I've deared to let vapply() and replicate()
behave logically (in the above sense) by default, i.e.
*not* back compatibly.

If you want to remain bug-compatible (:-),
for replicate() you can explicitly ask for  'simplify=TRUE'
instead of the new default simplify="array".
For vapply(), the extra work of implementing such a back/bug
compatibility option did not seem worth; in particular, as
vapply() is very new and not used in many places (on CRAN)
anyway.

The new replicate() default behavior has lead to two CRAN
packages ('emoa', 'plsRglm' whose authors I'll address privately)
to fail 'R CMD check'; inspection however shows that in both cases, 

1) the check failure is from examples / test functions

2) the usage there being

	t(replicate(N, foobar()))

where foobar() returns a 1D array instead of a vector, so one
way to "fix" the problem would be to change the above to

	t(replicate(N, t(foobar())))

So, in summary, the changed behavior of replicate() seems indeed
more logical insofar as it revealed programming/usage glitches
in other parts of R code.

BTW: The above makes me considering --- once again -- extending the
     definition of  t(a) to arrays a of array-rank {:= length(dim(a))} >= 3,
     and there generalize t(.) to be the same as 
     aperm(., rev(seq_along(dim(.))))

     {.. in the APL tradition where t() and aperm() really where the same}

Martin