[Rd] Why is as.function() slower than eval(call("function"())?

Fri Aug 4 06:32:06 CEST 2017

(Apologies if this is better suited for R-help.)

On my system (macOS Sierra, late 2014 MacBook Pro; R 3.4.1, Homebrew build), I found that it is faster to construct a function using eval(call("function", ...)) than using as.function(list(...)). Example:

    make_fn_1 <- function(a, b) eval(call("function", a, b), env = parent.frame())
    make_fn_2 <- function(a, b) as.function(c(a, list(b)), env = parent.frame())

    a <- as.pairlist(alist(x = , y = ))
    b <- quote(x + y)

    library("microbenchmark")
    microbenchmark(make_fn_1(a, b), make_fn_2(a, b))

    # Unit: microseconds
    #             expr   min     lq    mean median     uq    max neval cld
    #  make_fn_1(a, b) 1.671 1.8855 2.13297  2.039 2.1950  9.852   100  a
    #  make_fn_2(a, b) 3.541 3.7230 4.13400  3.906 4.1055 23.153   100   b

At first I thought the gap was due to the overhead of calling c(a, list(b)). But this turns out not to be the case:

    make_fn_weird <- function(a, b) as.function(c(a, b), env = parent.frame())
    b_wrapped <- list(b)

    make_fn_weirder <- function(a_b) as.function(a_b, env = parent.frame())
    a_b <- c(a, b_wrapped)

    microbenchmark(make_fn_1(a, b), make_fn_2(a, b),
                   make_fn_weird(a, b_wrapped), make_fn_weirder(a_b))

    # Unit: microseconds
    #                         expr   min     lq    mean median     uq    max neval cld
    #              make_fn_1(a, b) 1.718 1.8990 2.12119 1.9860 2.1605  8.057   100 a
    #              make_fn_2(a, b) 3.393 3.5865 4.03029 3.6655 3.9615 27.499   100   c
    #  make_fn_weird(a, b_wrapped) 3.354 3.5005 3.77190 3.6405 3.9425  6.839   100   c
    #         make_fn_weirder(a_b) 2.488 2.6290 2.83352 2.7215 2.8800  7.007   100  b

One IRC user pointed out that as.function() takes its own path through the code, namely do_asfunction() (in src/main/coerce.c). What is it about this code path that's 50% slower than whatever happens during eval(call("function", a, b))?

Obviously this is a trivial micro-optimization and it doesn't matter to 99% of users. Mostly asking out of curiosity, but also wondering if there's a more general lesson to be learned here.

Thanks!