[Rd] iterated lapply
Radford Neal
radford at cs.toronto.edu
Sun Mar 1 18:17:33 CET 2015
I think the discussion of this issue has gotten more complicated than
necessary.
First, there really is a bug. You can see this also by the fact that
delayed warning messages are wrong. For instance, in R-3.1.2:
> lapply(c(-1,2,-1),sqrt)
[[1]]
[1] NaN
[[2]]
[1] 1.414214
[[3]]
[1] NaN
Warning messages:
1: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
2: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
The first warning message should have "1L" rather than "3L". It
doesn't because lapply made a destructive change to the R expression
that was evaluated for the first element. Throughout the R
interpreter, there is a general assumption that expressions that are
or were evaluated are immutable, which lapply is not abiding by. The
only question is whether the bugs from this are sufficiently obscure
that it's worth keeping them for the gain in speed, but the speed cost
of fixing it is fairly small (though it's not negligible when the
function applied is something simple like sqrt).
The fix in the C code for lapply, vapply, and eapply is easy: Rather
than create an R expression such as FUN(X[[1L]]) for the first
function call, and then modify it in place to FUN(X[[2L]]), and so
forth, just create a new expression for each iteration. This requires
allocating a few new CONS cells each iteration, which does have a
cost, but not a huge one. It's certainly easier and faster than
creating a new environment (and also less likely to cause
incompatibilities).
The R code for apply can be changed to use the same approach,
rather than using expressions such as FUN(X[i,]), where i is an
index variable, it can create expressions like FUN(X[1L,]), then
FUN(X[2L,]), etc. The method for this is simple, like so:
> a <- quote(FUN(X[i,])) # "i" could be anything
> b <- a; b[[c(2,3)]] <- 1L # change "i" to 1L (creates new expr)
This has the added advantage of making error messages refer to the
actual index, not to "i", which has no meaning if you haven't looked
at the source code for apply (and which doesn't tell you which element
led to the error even if you know what "i" does).
I've implemented this in the development version of pqR, on the
development branch 31-apply-fix, at
https://github.com/radfordneal/pqR/tree/31-apply-fix
The changes are in src/main/apply.R, src/main/envir.R, and
src/library/base/R/apply.R, plus a new test in tests/apply.R. You can
compare to branch 31 to see what's changed. (Note rapply seems to not
have had a problem, and that other apply functions just use these, so
should be fixed as well.) There are also other optimizations in pqR
for these functions but the code is still quite similar to R-3.1.2.
Radford Neal
More information about the R-devel
mailing list