[Rd] RFC: sapply() limitation from vector to matrix, but not further

William Dunlap wdunlap at tibco.com
Wed Dec 1 17:56:26 CET 2010


> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Hadley Wickham
> Sent: Wednesday, December 01, 2010 6:27 AM
> To: Martin Maechler
> Cc: R-devel at stat.math.ethz.ch
> Subject: Re: [Rd] RFC: sapply() limitation from vector to 
> matrix,but not further
> 
> I think an even better approach would be to extract the
> "simplification" component out of sapply, so that could write
> 
> sapply <- function(...) simplify(lapply(...))
> 
> (although obviously some arguments would go to lapply and 
> some to simplify).
> 
> The advantage of this would be that you could use the same
> simplification algorithm in other places.

A downside of that approach is that lapply(X,...) can
cause a lot of unneeded memory to be allocated (length(X)
SEXP's).  Those SEXP's would be tossed out by simplify() but
the peak memory usage would remain high.  sapply() can
be written to avoid the intermediate list structure.

vapply() can avoid the intermediate list structure because
it knows what the output of FUN will look like and can
put the results directly into the desired output structure.
Perhaps its processing of the FUN.VALUE argument could be
beefed up so that matrices would be stacked as you want.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> 
> Hadley
> 
> On Wed, Dec 1, 2010 at 8:39 AM, Martin Maechler
> <maechler at stat.math.ethz.ch> wrote:
> > sapply() stems from S / S+ times and hence has a long tradition.
> > In spite of that I think that it should be enhanced...
> >
> > As the subject mentions, sapply() produces a matrix in cases
> > where the list components of the lapply(.) results are of the
> > same length (and ...).
> > However, it unfortunately "stops there".
> > E.g., if you *nest* two sapply() calls where the inner one
> > produces a matrix, very often the logical behavior would be for
> > the outer sapply() to stack these matrices into an array of
> > rank 3 ["array rank"(x) := length(dim(x))].
> > However it does not do that, e.g., an artifical example
> >
> > p0 <- function(...) paste(..., sep="")
> > myF <- function(x,y) {
> >    stopifnot(length(x) <= 3)
> >    x <- rep(x, length.out=3)
> >    ny <- length(y)
> >    r <- outer(x,y)
> >    dimnames(r) <- list(p0("r",1:3), p0("C", seq_len(ny)))
> >    r
> > }
> >
> > and
> >
> >> (v <- structure(10*(5:8), names=LETTERS[1:4]))
> >  A  B  C  D
> > 50 60 70 80
> >
> > if we let sapply() not simplify, we see the list of same size
> > matrices it produes:
> >
> >> sapply(v, myF, y = 2*(1:5), simplify=FALSE)
> > $A
> >    C1  C2  C3  C4  C5
> > r1 100 200 300 400 500
> > r2 100 200 300 400 500
> > r3 100 200 300 400 500
> >
> > $B
> >    C1  C2  C3  C4  C5
> > r1 120 240 360 480 600
> > r2 120 240 360 480 600
> > r3 120 240 360 480 600
> >
> > $C
> >    C1  C2  C3  C4  C5
> > r1 140 280 420 560 700
> > r2 140 280 420 560 700
> > r3 140 280 420 560 700
> >
> > $D
> >    C1  C2  C3  C4  C5
> > r1 160 320 480 640 800
> > r2 160 320 480 640 800
> > r3 160 320 480 640 800
> >
> > However, quite deceptively
> >
> >> sapply(v, myF, y = 2*(1:5))
> >        A   B   C   D
> >  [1,] 100 120 140 160
> >  [2,] 100 120 140 160
> >  [3,] 100 120 140 160
> >  [4,] 200 240 280 320
> >  [5,] 200 240 280 320
> >  [6,] 200 240 280 320
> >  [7,] 300 360 420 480
> >  [8,] 300 360 420 480
> >  [9,] 300 360 420 480
> > [10,] 400 480 560 640
> > [11,] 400 480 560 640
> > [12,] 400 480 560 640
> > [13,] 500 600 700 800
> > [14,] 500 600 700 800
> > [15,] 500 600 700 800
> >
> >
> > My proposal -- implemented and "make check" tested --
> > is to add an optional argument  'ARRAY'
> > which allows
> >
> >> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)
> > , , A
> >
> >    C1  C2  C3  C4  C5
> > r1 100 200 300 400 500
> > r2 100 200 300 400 500
> > r3 100 200 300 400 500
> >
> > , , B
> >
> >    C1  C2  C3  C4  C5
> > r1 120 240 360 480 600
> > r2 120 240 360 480 600
> > r3 120 240 360 480 600
> >
> > , , C
> >
> >    C1  C2  C3  C4  C5
> > r1 140 280 420 560 700
> > r2 140 280 420 560 700
> > r3 140 280 420 560 700
> >
> > , , D
> >
> >    C1  C2  C3  C4  C5
> > r1 160 320 480 640 800
> > r2 160 320 480 640 800
> > r3 160 320 480 640 800
> >
> >>
> > -----------
> >
> > In the best of all worlds, the default would be 'ARRAY = TRUE',
> > but of course, given the long-standing different behavior,
> > it seem much too "risky", and my proposal includes remaining
> > back-compatible with default ARRAY = FALSE.
> >
> > Martin Maechler,
> > ETH Zurich
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> 
> 
> 
> -- 
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list