[Rd] iterated lapply

luke-tierney at uiowa.edu luke-tierney at uiowa.edu
Sun Mar 1 20:37:50 CET 2015


On Sun, 1 Mar 2015, Radford Neal wrote:

> I think the discussion of this issue has gotten more complicated than
> necessary.

The discussion has gotten no more complicated than it needs to
be. There are other instances, such as Reduce where there is a bug
report pending that amounts to the same issue.  Performing surgery on
expressions and calling eval is not good practice at the R level and
probably not a good idea at the C level either.  It is worth thinking
this through carefully before a adopting a solution, which is what we
will be doing.

Best,

luke

>
> First, there really is a bug.  You can see this also by the fact that
> delayed warning messages are wrong.  For instance, in R-3.1.2:
>
>  > lapply(c(-1,2,-1),sqrt)
>  [[1]]
>  [1] NaN
>
>  [[2]]
>  [1] 1.414214
>
>  [[3]]
>  [1] NaN
>
>  Warning messages:
>  1: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
>  2: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
>
> The first warning message should have "1L" rather than "3L".  It
> doesn't because lapply made a destructive change to the R expression
> that was evaluated for the first element.  Throughout the R
> interpreter, there is a general assumption that expressions that are
> or were evaluated are immutable, which lapply is not abiding by.  The
> only question is whether the bugs from this are sufficiently obscure
> that it's worth keeping them for the gain in speed, but the speed cost
> of fixing it is fairly small (though it's not negligible when the
> function applied is something simple like sqrt).
>
> The fix in the C code for lapply, vapply, and eapply is easy: Rather
> than create an R expression such as FUN(X[[1L]]) for the first
> function call, and then modify it in place to FUN(X[[2L]]), and so
> forth, just create a new expression for each iteration.  This requires
> allocating a few new CONS cells each iteration, which does have a
> cost, but not a huge one.  It's certainly easier and faster than
> creating a new environment (and also less likely to cause
> incompatibilities).
>
> The R code for apply can be changed to use the same approach,
> rather than using expressions such as FUN(X[i,]), where i is an
> index variable, it can create expressions like FUN(X[1L,]), then
> FUN(X[2L,]), etc.  The method for this is simple, like so:
>
>  > a <- quote(FUN(X[i,]))     # "i" could be anything
>  > b <- a; b[[c(2,3)]] <- 1L  # change "i" to 1L (creates new expr)
>
> This has the added advantage of making error messages refer to the
> actual index, not to "i", which has no meaning if you haven't looked
> at the source code for apply (and which doesn't tell you which element
> led to the error even if you know what "i" does).
>
> I've implemented this in the development version of pqR, on the
> development branch 31-apply-fix, at
>
>  https://github.com/radfordneal/pqR/tree/31-apply-fix
>
> The changes are in src/main/apply.R, src/main/envir.R, and
> src/library/base/R/apply.R, plus a new test in tests/apply.R.  You can
> compare to branch 31 to see what's changed.  (Note rapply seems to not
> have had a problem, and that other apply functions just use these, so
> should be fixed as well.)  There are also other optimizations in pqR
> for these functions but the code is still quite similar to R-3.1.2.
>
>   Radford Neal
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list