[R] Parallel assignments and goto

Wed Feb 14 20:14:06 CET 2018

Dear Thomas,

This looks like a really interesting project, and I don't think that anyone responded to your message, though I may be mistaken.

I took at a look at implementing parallel assignment, and came up with:

passign <- function(..., envir=parent.frame()){
    exprs <- list(...)
    vars <- names(exprs)
    exprs <- lapply(exprs, FUN=eval, envir=envir)
    for (i in seq_along(exprs)){
        assign(vars[i], exprs[[i]], envir=envir)
    }
}

For example,

> fun <- function(){
+     a <- 10
+     passign(a=1, b=a + 2, c=3)
+     cat("a =", a, " b =", b, " c =", c, "\n")
+ }
> fun()
a = 1  b = 12  c = 3

This proves to be faster than what you tried, but still much slower than using a local variable (or variables) -- see below. I wouldn't be surprised if someone can come up with a faster implementation, but I suspect that the overhead of function calls will be hard to overcome. BTW, a version of my passign() that uses mapply() in place of a for loop (not shown) is even slower.

> factorial_tr_3 <- function (n, acc = 1) {
+     repeat {
+         if (n <= 1) 
+             return(acc)
+         else {
+             passign(n = n - 1, acc = acc * n)
+             next
+         }
+     }
+ }

> microbenchmark::microbenchmark(factorial(100),
+ factorial_tr_1(100),
+ factorial_tr_2(100),
+ factorial_tr_3(100))
Unit: microseconds
                expr       min        lq       mean     median        uq       max neval cld
      factorial(100)    55.009    69.290   100.4507   104.5515   131.174   228.496   100 a  
 factorial_tr_1(100)    10.227    11.637    14.4967    13.7530    15.515    89.565   100 a  
 factorial_tr_2(100) 21523.751 23038.417 24477.1734 24058.3635 25041.988 45814.136   100   c
 factorial_tr_3(100)   806.789   861.797   914.3651   879.9565   925.444  2139.329   100  b

Best,
 John

-----------------------------
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: socialsciences.mcmaster.ca/jfox/

> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Thomas
> Mailund
> Sent: Sunday, February 11, 2018 10:49 AM
> To: r-help at r-project.org
> Subject: [R] Parallel assignments and goto
> 
> Hi guys,
> 
> I am working on some code for automatically translating recursive functions into
> looping functions to implemented tail-recursion optimisations. See
> https://github.com/mailund/tailr
> 
> As a toy-example, consider the factorial function
> 
> factorial <- function(n, acc = 1) {
>     if (n <= 1) acc
>     else factorial(n - 1, acc * n)
> }
> 
> I can automatically translate this into the loop-version
> 
> factorial_tr_1 <- function (n, acc = 1) {
>     repeat {
>         if (n <= 1)
>             return(acc)
>         else {
>             .tailr_n <- n - 1
>             .tailr_acc <- acc * acc
>             n <- .tailr_n
>             acc <- .tailr_acc
>             next
>         }
>     }
> }
> 
> which will run faster and not have problems with recursion depths. However,
> I’m not entirely happy with this version for two reasons: I am not happy with
> introducing the temporary variables and this rewrite will not work if I try to
> over-scope an evaluation context.
> 
> I have two related questions, one related to parallel assignments — i.e.
> expressions to variables so the expression uses the old variable values and not
> the new values until the assignments are all done — and one related to
> restarting a loop from nested loops or from nested expressions in `with`
> expressions or similar.
> 
> I can implement parallel assignment using something like rlang::env_bind:
> 
> factorial_tr_2 <- function (n, acc = 1) {
>     .tailr_env <- rlang::get_env()
>     repeat {
>         if (n <= 1)
>             return(acc)
>         else {
>             rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
>             next
>         }
>     }
> }
> 
> This reduces the number of additional variables I need to one, but is a couple of
> orders of magnitude slower than the first version.
> 
> > microbenchmark::microbenchmark(factorial(100),
> +                                factorial_tr_1(100),
> +                                factorial_tr_2(100))
> Unit: microseconds
>                      expr      min       lq       mean    median       uq      max neval
>       factorial(100)   53.978   60.543   77.76203   71.0635   85.947  180.251   100
>  factorial_tr_1(100)    9.022    9.903   11.52563   11.0430   11.984   28.464
> 100
>  factorial_tr_2(100) 5870.565 6109.905 6534.13607 6320.4830 6756.463
> 8177.635   100
> 
> 
> Is there another way to do parallel assignments that doesn’t cost this much in
> running time?
> 
> My other problem is the use of `next`. I would like to combine tail-recursion
> optimisation with pattern matching as in https://github.com/mailund/pmatch
> where I can, for example, define a linked list like this:
> 
> devtools::install_github("mailund/pmatch”)
> library(pmatch)
> llist := NIL | CONS(car, cdr : llist)
> 
> and define a function for computing the length of a list like this:
> 
> list_length <- function(lst, acc = 0) {
>   force(acc)
>   cases(lst,
>         NIL -> acc,
>         CONS(car, cdr) -> list_length(cdr, acc + 1)) }
> 
> The `cases` function creates an environment that binds variables in a pattern-
> description that over-scopes the expression to the right of `->`, so the recursive
> call in this example have access to the variables `cdr` and `car`.
> 
> I can transform a `cases` call to one that creates the environment containing the
> bound variables and then evaluate this using `eval` or `with`, but in either case,
> a call to `next` will not work in such a context. The expression will be evaluated
> inside `bind` or `with`, and not in the `list_lenght` function.
> 
> A version that *will* work, is something like this
> 
> factorial_tr_3 <- function (n, acc = 1) {
>     .tailr_env <- rlang::get_env()
>     .tailr_frame <- rlang::current_frame()
>     repeat {
>         if (n <= 1)
>             rlang::return_from(.tailr_frame, acc)
>         else {
>             rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
>             rlang::return_to(.tailr_frame)
>         }
>     }
> }
> 
> Here, again, for the factorial function since this is easier to follow than the list-
> length function.
> 
> This solution will also work if you return values from inside loops, where `next`
> wouldn’t work either.
> 
> Using `rlang::return_from` and `rlang::return_to` implements the right
> semantics, but costs me another order of magnitude in running time.
> 
> microbenchmark::microbenchmark(factorial(100),
>                                factorial_tr_1(100),
>                                factorial_tr_2(100),
>                                factorial_tr_3(100))
> Unit: microseconds
>                 expr       min         lq        mean     median        uq        max neval
>       factorial(100)    52.479    60.2640    93.43069    67.5130    83.925   2062.481
> 100
>  factorial_tr_1(100)     8.875     9.6525    49.19595    10.6945    11.217
> 3818.823   100
>  factorial_tr_2(100)  5296.350  5525.0745  5973.77664  5737.8730  6260.128
> 8471.301   100
>  factorial_tr_3(100) 77554.457 80757.0905 87307.28737 84004.0725
> 89859.169 171039.228   100
> 
> I can live with the “introducing extra variables” solution to parallel assignment,
> and I could hack my way out of using `with` or `bind` in rewriting `cases`, but
> restarting a `repeat` loop would really make for a nicer solution. I know that
> `goto` is considered harmful, but really, in this case, it is what I want.
> 
> A `callCC` version also solves the problem
> 
> factorial_tr_4 <- function(n, acc = 1) {
>     function_body <- function(continuation) {
>         if (n <= 1) {
>             continuation(acc)
>         } else {
>             continuation(list("continue", n = n - 1, acc = acc * n))
>         }
>     }
>     repeat {
>         result <- callCC(function_body)
>         if (is.list(result) && result[[1]] == "continue") {
>             n <- result$n
>             acc <- result$acc
>             next
>         } else {
>             return(result)
>         }
>     }
> }
> 
> But this requires that I know how to distinguish between a valid return value
> and a tag for “next” and is still a lot slower than the `next` solution
> 
> microbenchmark::microbenchmark(factorial(100),
>                                factorial_tr_1(100),
>                                factorial_tr_2(100),
>                                factorial_tr_3(100),
>                                factorial_tr_4(100))
> Unit: microseconds
>                 expr       min         lq        mean     median        uq        max neval
>       factorial(100)    54.109    61.8095    81.33167    81.8785    89.748    243.554
> 100
>  factorial_tr_1(100)     9.025     9.9035    11.38607    11.1990    12.008     22.375
> 100
>  factorial_tr_2(100)  5272.524  5798.3965  6302.40467  6077.7180  6492.959
> 9967.237   100
>  factorial_tr_3(100) 66186.080 72336.2810 76480.75172 73632.9665
> 75405.054 203785.673   100
>  factorial_tr_4(100)   270.978   302.7890   337.48763   313.9930   334.096
> 1425.702   100
> 
> I don’t necessarily need the tail-recursion optimisation to be faster than the
> recursive version; just getting out of the problem of too deep recursions is a
> benefit, but I would rather not pay with an order of magnitude for it. I could, of
> course, try to handle cases that works with `next` in one way, and other cases
> using `callCC`, but I feel it should be possible with a version that handles all cases
> the same way.
> 
> Is there any way to achieve this?
> 
> Cheers
> 	Thomas
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.