[Rd] iterated lapply

Martin Maechler maechler at lynne.stat.math.ethz.ch
Thu Feb 26 12:11:01 CET 2015


>>>>> Michael Weylandt <michael.weylandt at gmail.com>
>>>>>     on Wed, 25 Feb 2015 21:43:36 -0500 writes:

    >> On Feb 25, 2015, at 5:35 PM, Benjamin Tyner
    >> <btyner at gmail.com> wrote:
    >> 
    >> Actually, it depends on the number of cores:

    > Under current semantics, yes. Each 'stream' of function
    > calls is lazily capturing the last value of `i` on that
    > core.

    > Under Luke's proposed semantics (IIUC), the result would
    > be the same (2,4,6,8) for both parallel and serial
    > execution. This is what allows for 'drop-in' parallelism.

    >>> fun1 <- function(c){function(i){c*i}} fun2 <-
    >>> function(f) f(2) sapply(mclapply(1:4, fun1,
    >>> mc.cores=1L), fun2)
    >> [1] 8 8 8 8
    >>> sapply(mclapply(1:4, fun1, mc.cores=2L), fun2)
    >> [1] 6 8 6 8
    >>> sapply(mclapply(1:4, fun1, mc.cores=4L), fun2)
    >> [1] 2 4 6 8
    >> 

Thank you, Michael and Benjamin.

I strongly agree with your statements and the very strong desirability of
these mclapply() calls to behave the same as lapply().

So indeed, something like Luke's proposed changes both for
lapply(), mclapply()  --- *and* the other *apply() versions in
the parallel packages where needed (??) --- are very desirable.

In my teaching, and in our CRAN package 'simsalapar' we
that useRs should organize computations such that using lapply
serially is used for preliminary testing and  mclapply() etc are
used for the heavy weight computations.

Best,
Martin Maechler

> >>> / On Feb 24, 2015, at 10:50 AM, <luke-tierney at uiowa.edu <https://stat.ethz.ch/mailman/listinfo/r-devel>> wrote:
> >> />/ 
> >> />/ The documentation is not specific enough on the indented semantics in
> >> />/ this situation to consider this a bug. The original R-level
> >> />/ implementation of lapply was
> >> />/ 
> >> />/    lapply <- function(X, FUN, ...) {
> >> />/        FUN <- match.fun(FUN)
> >> />/        if (!is.list(X))
> >> />/        X <- as.list(X)
> >> />/        rval <- vector("list", length(X))
> >> />/        for(i in seq(along = X))
> >> />/        rval[i] <- list(FUN(X[[i]], ...))
> >> />/        names(rval) <- names(X)           # keep `names' !
> >> />/        return(rval)
> >> />/    }
> >> />/ 
> >> />/ and the current internal implementation is consistent with this. With
> >> />/ a loop like this lazy evaluation and binding assignment interact in
> >> />/ this way; the force() function was introduced to help with this.
> >> />/ 
> >> />/ That said, the expression FUN(X[[i]], ...) could be replaced by
> >> />/ 
> >> />/    local({
> >> />/        i <- i
> >> />/        list(FUN(X[[i]], ...)
> >> />/    })
> >> />/ 
> >> />/ which would produce the more desirable result
> >> />/ 
> >> />/    > sapply(test, function(myfn) myfn(2))
> >> />/    [1] 2 4 6 8
> >> />/ 
> >> /
> >> Would the same semantics be applied to parallel::mclapply and friends?
> >> 
> >> sapply(lapply(1:4, function(c){function(i){c*i}}), function(f) f(2))
> >> 
> >> # [1] 8 8 8 8
> >> 
> >> sapply(mclapply(1:4, function(c){function(i){c*i}}), function(f) f(2))
> >> 
> >> # [1] 6 8 6 8
> >> 
> >> I understand why they differ, but making mclapply easier for 'drop-in' parallelism might be a good thing. 
> >> 
> >> Michael



More information about the R-devel mailing list