[R] Parallel assignments and goto

Thomas Mailund thomas.mailund at gmail.com
Sun Feb 11 18:25:52 CET 2018


I admit I didn’t know about Recall, but you are right, there is no direct support for this tail-recursion optimisation. For good reasons — it would break a lot of NSE. I am not attempting to solve tail-recursion optimisation for all cases. That wouldn’t work by just rewriting functions. It might be doable with JIT or something like that, but my goal is less ambitious.

Using local, though, might be an approach. I will play around with that tomorrow.

Cheers

On 11 Feb 2018, 18.19 +0100, David Winsemius <dwinsemius at comcast.net>, wrote:
>
> > On Feb 11, 2018, at 7:48 AM, Thomas Mailund <thomas.mailund at gmail.com> wrote:
> >
> > Hi guys,
> >
> > I am working on some code for automatically translating recursive functions into looping functions to implemented tail-recursion optimisations. See https://github.com/mailund/tailr
> >
> > As a toy-example, consider the factorial function
> >
> > factorial <- function(n, acc = 1) {
> > if (n <= 1) acc
> > else factorial(n - 1, acc * n)
> > }
> >
> > I can automatically translate this into the loop-version
> >
> > factorial_tr_1 <- function (n, acc = 1)
> > {
> > repeat {
> > if (n <= 1)
> > return(acc)
> > else {
> > .tailr_n <- n - 1
> > .tailr_acc <- acc * acc
> > n <- .tailr_n
> > acc <- .tailr_acc
> > next
> > }
> > }
> > }
> >
> > which will run faster and not have problems with recursion depths. However, I’m not entirely happy with this version for two reasons: I am not happy with introducing the temporary variables and this rewrite will not work if I try to over-scope an evaluation context.
> >
> > I have two related questions, one related to parallel assignments — i.e. expressions to variables so the expression uses the old variable values and not the new values until the assignments are all done — and one related to restarting a loop from nested loops or from nested expressions in `with` expressions or similar.
> >
> > I can implement parallel assignment using something like rlang::env_bind:
> >
> > factorial_tr_2 <- function (n, acc = 1)
> > {
> > .tailr_env <- rlang::get_env()
> > repeat {
> > if (n <= 1)
> > return(acc)
> > else {
> > rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
> > next
> > }
> > }
> > }
> >
> > This reduces the number of additional variables I need to one, but is a couple of orders of magnitude slower than the first version.
> >
> > > microbenchmark::microbenchmark(factorial(100),
> > + factorial_tr_1(100),
> > + factorial_tr_2(100))
> > Unit: microseconds
> > expr min lq mean median uq max neval
> > factorial(100) 53.978 60.543 77.76203 71.0635 85.947 180.251 100
> > factorial_tr_1(100) 9.022 9.903 11.52563 11.0430 11.984 28.464 100
> > factorial_tr_2(100) 5870.565 6109.905 6534.13607 6320.4830 6756.463 8177.635 100
> >
> >
> > Is there another way to do parallel assignments that doesn’t cost this much in running time?
> >
> > My other problem is the use of `next`. I would like to combine tail-recursion optimisation with pattern matching as in https://github.com/mailund/pmatch where I can, for example, define a linked list like this:
> >
> > devtools::install_github("mailund/pmatch”)
> > library(pmatch)
> > llist := NIL | CONS(car, cdr : llist)
> >
> > and define a function for computing the length of a list like this:
> >
> > list_length <- function(lst, acc = 0) {
> > force(acc)
> > cases(lst,
> > NIL -> acc,
> > CONS(car, cdr) -> list_length(cdr, acc + 1))
> > }
> >
> > The `cases` function creates an environment that binds variables in a pattern-description that over-scopes the expression to the right of `->`, so the recursive call in this example have access to the variables `cdr` and `car`.
> >
> > I can transform a `cases` call to one that creates the environment containing the bound variables and then evaluate this using `eval` or `with`, but in either case, a call to `next` will not work in such a context. The expression will be evaluated inside `bind` or `with`, and not in the `list_lenght` function.
> >
> > A version that *will* work, is something like this
> >
> > factorial_tr_3 <- function (n, acc = 1)
> > {
> > .tailr_env <- rlang::get_env()
> > .tailr_frame <- rlang::current_frame()
> > repeat {
> > if (n <= 1)
> > rlang::return_from(.tailr_frame, acc)
> > else {
> > rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
> > rlang::return_to(.tailr_frame)
> > }
> > }
> > }
> >
> > Here, again, for the factorial function since this is easier to follow than the list-length function.
> >
> > This solution will also work if you return values from inside loops, where `next` wouldn’t work either.
> >
> > Using `rlang::return_from` and `rlang::return_to` implements the right semantics, but costs me another order of magnitude in running time.
> >
> > microbenchmark::microbenchmark(factorial(100),
> > factorial_tr_1(100),
> > factorial_tr_2(100),
> > factorial_tr_3(100))
> > Unit: microseconds
> > expr min lq mean median uq max neval
> > factorial(100) 52.479 60.2640 93.43069 67.5130 83.925 2062.481 100
> > factorial_tr_1(100) 8.875 9.6525 49.19595 10.6945 11.217 3818.823 100
> > factorial_tr_2(100) 5296.350 5525.0745 5973.77664 5737.8730 6260.128 8471.301 100
> > factorial_tr_3(100) 77554.457 80757.0905 87307.28737 84004.0725 89859.169 171039.228 100
> >
> > I can live with the “introducing extra variables” solution to parallel assignment, and I could hack my way out of using `with` or `bind` in rewriting `cases`, but restarting a `repeat` loop would really make for a nicer solution. I know that `goto` is considered harmful, but really, in this case, it is what I want.
> >
> > A `callCC` version also solves the problem
> >
> > factorial_tr_4 <- function(n, acc = 1) {
> > function_body <- function(continuation) {
> > if (n <= 1) {
> > continuation(acc)
> > } else {
> > continuation(list("continue", n = n - 1, acc = acc * n))
> > }
> > }
> > repeat {
> > result <- callCC(function_body)
> > if (is.list(result) && result[[1]] == "continue") {
> > n <- result$n
> > acc <- result$acc
> > next
> > } else {
> > return(result)
> > }
> > }
> > }
> >
> > But this requires that I know how to distinguish between a valid return value and a tag for “next” and is still a lot slower than the `next` solution
> >
> > microbenchmark::microbenchmark(factorial(100),
> > factorial_tr_1(100),
> > factorial_tr_2(100),
> > factorial_tr_3(100),
> > factorial_tr_4(100))
> > Unit: microseconds
> > expr min lq mean median uq max neval
> > factorial(100) 54.109 61.8095 81.33167 81.8785 89.748 243.554 100
> > factorial_tr_1(100) 9.025 9.9035 11.38607 11.1990 12.008 22.375 100
> > factorial_tr_2(100) 5272.524 5798.3965 6302.40467 6077.7180 6492.959 9967.237 100
> > factorial_tr_3(100) 66186.080 72336.2810 76480.75172 73632.9665 75405.054 203785.673 100
> > factorial_tr_4(100) 270.978 302.7890 337.48763 313.9930 334.096 1425.702 100
> >
> > I don’t necessarily need the tail-recursion optimisation to be faster than the recursive version; just getting out of the problem of too deep recursions is a benefit, but I would rather not pay with an order of magnitude for it. I could, of course, try to handle cases that works with `next` in one way, and other cases using `callCC`, but I feel it should be possible with a version that handles all cases the same way.
> >
> > Is there any way to achieve this?
> >
> > Cheers
> > Thomas
>
> I didn't see any reference to the R `Recall` or `local` functions. I don't remember that tail optimization is something that R provides, however.
>
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list