[Rd] recursion problem using do.call(rbind, list(..,<S4>,..))

Martin Maechler maechler at stat.math.ethz.ch
Tue May 10 12:43:42 CEST 2016

This was originally a bug report about Matrix,
but the bug is rather a "design" bug in R, or a limitation.

This e-mail is a report of the status quo as I see it, and
call for comments, sugguests, help/hints for workarounds,
or even a suggestion for a programming task helping R core to
amend the status {re-writing the methods:::cbind and ...:rbind functions}:

If you read  ?rbind  carefully, you may have learned that rbind() and
cbind() are able to deal with S4 "matrix-like" objects, via the
hidden methods:::rbind /  methods:::cbind functions
where these recursively build on appropriate S4 methods for
rbind2() / cbind2().

That is how cbind() and rbind() work nowadays for Matrix-package

However, there is problem lurking from the above paragraph, and
for experienced programmers / computer scientists that may even
be obvious: recursion.

A simple MRE (minimal reproducible example) for the problem seen with Matrix:

  S <- sparseMatrix(i=1:4, j=9:6, x=pi*1:4)
  n <- 410 # it succeeds for 407 -- sometimes even with 408
  X <- replicate(n, S, simplify = FALSE)
  Xb <- do.call("rbind", X)
  ## -> Error in as(x, "CsparseMatrix") : node stack overflow

But as I said, that's not really the Matrix package. A small
reproducible example, not involving the Matrix package at all:

  MM <- setClass("mmatrix", contains="matrix")
  setMethod("rbind2", signature(x="mmatrix", y="mmatrix"),
	    function(x,y) as(base::rbind(unclass(x), unclass(y)), "mmatrix"))

  (m5 <- MM(diag(5)))
  m45 <- MM(matrix(1:20, 4,5))
  rbind(m5, m45) # fine

  ## fine with 400 :
  mL.4c <- replicate(400, m45, simplify=FALSE)
  mmm4 <- do.call(rbind, mL.4c)
  stopifnot(is(mmm4, "mmatrix"), dim(mmm4) == c(1600L, 5L))
  ## not ok with 410 :
  mL.410 <- replicate(410, m45, simplify=FALSE)
  mmm4 <- do.call(rbind, mL.410)
  ## Error in getExportedValue(pkg, name) (from #2) : node stack overflow

and the "node stack overflow"  is not too helpful.
Unfortunately, this is more than one problem, the first one being
that recursive function calls nowadays often end with this
"node stack overflow" error, rather than the somewhat more understandable
error message, namely

  Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

And even worse, that nowadays increasing the option to a higher number N,
    options(expressions = N)

does not help at all in this situation, once the code is byte
compiled.... which of course is true for everything in base R.

But that is basically only the reason for the not-so-helpful
error message (and that raising 'expressions' does not help!),
but the underlying problem is somewhat harder, namely the full
setup the S4-serving methods:::rbind() {and ...cbind()} using
recursion (in the way it does) at all.

There is a simple, in my eyes ugly workaround which also will not scale well,
but works fine and fast enough for the above example:

## Simple ugly workaround .. but well fast enough :
> system.time({
+ r <- mL.410[[1]]
+ for(i in seq_along(mL.410)[-1])
+     r <- rbind2(r, mL.410[[i]])
+ })
   user  system elapsed 
  0.083   0.000   0.083 
> dim(r)
[1] 1640    5

This should help Ben (the OP of the Matrix bug), and may be
something like that should also guide on how to re-write
the  methods:::rbind()  / methods:::cbind()  in a non-recursive
fashion ?

Thank you for your thoughts.

Martin Maechler
ETH Zurich

More information about the R-devel mailing list