[Rd] recursion problem using do.call(rbind, list(..,<S4>,..))

Hadley Wickham h.wickham at gmail.com
Fri May 13 22:11:09 CEST 2016


Hi Martin,

I think this is a general problem with any function that does dispatch
on ... - I think for well-defined behaviour you always need to
dispatch on pairs, folding/reducing (like in your code) to get a
single value. The downside of this approach is obviously performance:
for n inputs, you need n - 1 temporary objects, which is going to
hamper performance (which is problematic for rbind since it's not
uncommon to have lists of thousands of small data frames). I don't see
anyway around this, except maybe you just have to have specialised
variants (like dplyr::bind_rows() and data.table::rbindlist()) that
don't do dispatch.

I've cc'd Michael since he might have missed the original thread, and
I suspect he's probably thought more about this than me.

Hadley

On Tue, May 10, 2016 at 5:43 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
> This was originally a bug report about Matrix,
>   https://r-forge.r-project.org/tracker/?func=detail&atid=294&aid=6325&group_id=61
> but the bug is rather a "design" bug in R, or a limitation.
>
> This e-mail is a report of the status quo as I see it, and
> call for comments, sugguests, help/hints for workarounds,
> or even a suggestion for a programming task helping R core to
> amend the status {re-writing the methods:::cbind and ...:rbind functions}:
>
>
> If you read  ?rbind  carefully, you may have learned that rbind() and
> cbind() are able to deal with S4 "matrix-like" objects, via the
> hidden methods:::rbind /  methods:::cbind functions
> where these recursively build on appropriate S4 methods for
> rbind2() / cbind2().
>
> That is how cbind() and rbind() work nowadays for Matrix-package
> matrices.
>
> However, there is problem lurking from the above paragraph, and
> for experienced programmers / computer scientists that may even
> be obvious: recursion.
>
> A simple MRE (minimal reproducible example) for the problem seen with Matrix:
>
>   S <- sparseMatrix(i=1:4, j=9:6, x=pi*1:4)
>   n <- 410 # it succeeds for 407 -- sometimes even with 408
>   X <- replicate(n, S, simplify = FALSE)
>   Xb <- do.call("rbind", X)
>   ## -> Error in as(x, "CsparseMatrix") : node stack overflow
>
> But as I said, that's not really the Matrix package. A small
> reproducible example, not involving the Matrix package at all:
>
>   MM <- setClass("mmatrix", contains="matrix")
>   setMethod("rbind2", signature(x="mmatrix", y="mmatrix"),
>             function(x,y) as(base::rbind(unclass(x), unclass(y)), "mmatrix"))
>
>   (m5 <- MM(diag(5)))
>   m45 <- MM(matrix(1:20, 4,5))
>   rbind(m5, m45) # fine
>
>   ## fine with 400 :
>   mL.4c <- replicate(400, m45, simplify=FALSE)
>   mmm4 <- do.call(rbind, mL.4c)
>   stopifnot(is(mmm4, "mmatrix"), dim(mmm4) == c(1600L, 5L))
>   ## not ok with 410 :
>   mL.410 <- replicate(410, m45, simplify=FALSE)
>   mmm4 <- do.call(rbind, mL.410)
>   ## Error in getExportedValue(pkg, name) (from #2) : node stack overflow
>
> and the "node stack overflow"  is not too helpful.
> Unfortunately, this is more than one problem, the first one being
> that recursive function calls nowadays often end with this
> "node stack overflow" error, rather than the somewhat more understandable
> error message, namely
>
>   Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
>
> And even worse, that nowadays increasing the option to a higher number N,
>     options(expressions = N)
>
> does not help at all in this situation, once the code is byte
> compiled.... which of course is true for everything in base R.
>
> But that is basically only the reason for the not-so-helpful
> error message (and that raising 'expressions' does not help!),
> but the underlying problem is somewhat harder, namely the full
> setup the S4-serving methods:::rbind() {and ...cbind()} using
> recursion (in the way it does) at all.
>
> There is a simple, in my eyes ugly workaround which also will not scale well,
> but works fine and fast enough for the above example:
>
> ## Simple ugly workaround .. but well fast enough :
>> system.time({
> + r <- mL.410[[1]]
> + for(i in seq_along(mL.410)[-1])
> +     r <- rbind2(r, mL.410[[i]])
> + })
>    user  system elapsed
>   0.083   0.000   0.083
>> dim(r)
> [1] 1640    5
>>
>
> This should help Ben (the OP of the Matrix bug), and may be
> something like that should also guide on how to re-write
> the  methods:::rbind()  / methods:::cbind()  in a non-recursive
> fashion ?
>
> Thank you for your thoughts.
>
> Martin Maechler
> ETH Zurich
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz



More information about the R-devel mailing list