[Rd] iterated lapply
luke-tierney at uiowa.edu
luke-tierney at uiowa.edu
Sun Mar 1 20:37:50 CET 2015
On Sun, 1 Mar 2015, Radford Neal wrote:
> I think the discussion of this issue has gotten more complicated than
> necessary.
The discussion has gotten no more complicated than it needs to
be. There are other instances, such as Reduce where there is a bug
report pending that amounts to the same issue. Performing surgery on
expressions and calling eval is not good practice at the R level and
probably not a good idea at the C level either. It is worth thinking
this through carefully before a adopting a solution, which is what we
will be doing.
Best,
luke
>
> First, there really is a bug. You can see this also by the fact that
> delayed warning messages are wrong. For instance, in R-3.1.2:
>
> > lapply(c(-1,2,-1),sqrt)
> [[1]]
> [1] NaN
>
> [[2]]
> [1] 1.414214
>
> [[3]]
> [1] NaN
>
> Warning messages:
> 1: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
> 2: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
>
> The first warning message should have "1L" rather than "3L". It
> doesn't because lapply made a destructive change to the R expression
> that was evaluated for the first element. Throughout the R
> interpreter, there is a general assumption that expressions that are
> or were evaluated are immutable, which lapply is not abiding by. The
> only question is whether the bugs from this are sufficiently obscure
> that it's worth keeping them for the gain in speed, but the speed cost
> of fixing it is fairly small (though it's not negligible when the
> function applied is something simple like sqrt).
>
> The fix in the C code for lapply, vapply, and eapply is easy: Rather
> than create an R expression such as FUN(X[[1L]]) for the first
> function call, and then modify it in place to FUN(X[[2L]]), and so
> forth, just create a new expression for each iteration. This requires
> allocating a few new CONS cells each iteration, which does have a
> cost, but not a huge one. It's certainly easier and faster than
> creating a new environment (and also less likely to cause
> incompatibilities).
>
> The R code for apply can be changed to use the same approach,
> rather than using expressions such as FUN(X[i,]), where i is an
> index variable, it can create expressions like FUN(X[1L,]), then
> FUN(X[2L,]), etc. The method for this is simple, like so:
>
> > a <- quote(FUN(X[i,])) # "i" could be anything
> > b <- a; b[[c(2,3)]] <- 1L # change "i" to 1L (creates new expr)
>
> This has the added advantage of making error messages refer to the
> actual index, not to "i", which has no meaning if you haven't looked
> at the source code for apply (and which doesn't tell you which element
> led to the error even if you know what "i" does).
>
> I've implemented this in the development version of pqR, on the
> development branch 31-apply-fix, at
>
> https://github.com/radfordneal/pqR/tree/31-apply-fix
>
> The changes are in src/main/apply.R, src/main/envir.R, and
> src/library/base/R/apply.R, plus a new test in tests/apply.R. You can
> compare to branch 31 to see what's changed. (Note rapply seems to not
> have had a problem, and that other apply functions just use these, so
> should be fixed as well.) There are also other optimizations in pqR
> for these functions but the code is still quite similar to R-3.1.2.
>
> Radford Neal
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-devel
mailing list