[Rd] PATCH: Avoiding extra copies (NAMED bumped) with source(..., print.eval=FALSE) ...and with print.eval=TRUE?

Tue Jun 17 23:27:14 CEST 2014

OBJECTIVE:
To update source(..., print.eval=FALSE) to not use withVisible()
unless really needed. This avoids unnecessary increases of reference
counts/NAMED introduced by withVisible(), which in turn avoids
unnecessary memory allocations and garbage collection overhead.  This
has an impact on all source():ed scripts, e.g. pre-allocation of large
matrices to save memory does *not always* help in such setups.  It is
likely to also affect the evaluation of code chunks of various
vignette engines.


BACKGROUND:
If you run the following at the prompt, you get that the assignment of
the first element does *not* cause an extra copy of 'x':

> x <- 1:2
> tracemem(x)
[1] "<0x000000001c5a7b28>"
> x[1] <- 0L
> x
[1] 0 2



However, it you source() the same code (with print.eval=FALSE; the
default), you get:

> code <- "
x <- 1:2
tracemem(x)
x[1] <- 0L
"
> source(textConnection(code))
tracemem[0x0000000010504e20 -> 0x0000000010509cd0]: eval eval withVisible source
> x
[1] 0 2

Looking at how source() works, this is because it effectively does:

> invisible(withVisible(x <- 1:2))
> invisible(withVisible(tracemem(x)))
> invisible(withVisible(x[1] <- 0L))
tracemem[0x00000000104b68a8 -> 0x00000000104b6788]: withVisible
> x
[1] 0 2


WORKAROUND HACK:
I understand that wrapping up multiple expressions into one avoids this:

> code <- "{
x <- 1:2
tracemem(x)
x[1] <- 0L
}"
> source(textConnection(code))

which you in this case can narrow down to:

code <- "
{x <- 1:2; {}}
tracemem(x)
x[1] <- 0L
"
source(textConnection(code))

but that's not my point here.  Instead, I believe R can handle this
better itself.


DISCUSSION / PATCH:
It's quite easy to patch base::source(..., print.eval=FALSE) to avoid
the extra copies, because source() uses withVisible() so that it:

(a) can show()/print() the value of each expression (when
print.eval=TRUE), as well as
(b) returning the value of the last expression evaluated.

Thus, with print.eval=FALSE, withVisible() is only needed for the very
last expression evaluated.

Here is a patch to source() that avoids calling withVisible() unless needed:

$ svn diff src/library/base/R/source.R
Index: src/library/base/R/source.R
===================================================================

--- src/library/base/R/source.R (revision 65900)
+++ src/library/base/R/source.R (working copy)
@@ -206,7 +206,10 @@
            }
        }
        if (!tail) {
-           yy <- withVisible(eval(ei, envir))
+            if (print.eval || i == Ne+echo)
+                yy <- withVisible(eval(ei, envir))
+            else
+                eval(ei, envir)
            i.symbol <- mode(ei[[1L]]) == "name"
            if (!i.symbol) {
                ## ei[[1L]] : the function "<-" or other


With this patch you get:

> source(textConnection(code), echo=TRUE, print.eval=FALSE)

> x <- 1:2
> tracemem(x)
> x[1] <- 0L


> source(textConnection(code), echo=TRUE, print.eval=TRUE)

> x <- 1:2
> tracemem(x)
[1] "<0x000000001c5675c0>"
> x[1] <- 0L
tracemem[0x000000001c5675c0 -> 0x000000001c564ad0]: eval eval withVisible source


FURTHER IMPROVEMENTS:
Looking at the internals of withVisible():

/* This is a special .Internal */
SEXP attribute_hidden do_withVisible(SEXP call, SEXP op, SEXP args, SEXP rho)
{
    SEXP x, nm, ret;

    checkArity(op, args);
    x = CAR(args);
    x = eval(x, rho);
    PROTECT(x);
    PROTECT(ret = allocVector(VECSXP, 2));
    PROTECT(nm = allocVector(STRSXP, 2));
    SET_STRING_ELT(nm, 0, mkChar("value"));
    SET_STRING_ELT(nm, 1, mkChar("visible"));
    SET_VECTOR_ELT(ret, 0, x);
    SET_VECTOR_ELT(ret, 1, ScalarLogical(R_Visible));
    setAttrib(ret, R_NamesSymbol, nm);
    UNPROTECT(3);
    return ret;
}

Not sure exactly where the reference count (NAMED is updated) is
bumped, but *if it is possible to evaluate the expression and inspect
if the value is "visible" or not before it happens*, then one could
imaging adding an option to withVisible() that tells it to only return
the value if the evaluated value is visible (otherwise NULL).  That
way one could avoid extra copies in most cases also with
print.eval=TRUE, e.g.

> withVisible(x[1] <- 0L)
$value
[1] 0

$visible
[1] FALSE

In other words, whenever withVisible() returns visible=FALSE, the
return values is not used by source().

Comments?

/Henrik

> sessionInfo()
R Under development (unstable) (2014-06-12 r65926)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base