[Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

Iñaki Ucar |uc@r @end|ng |rom |edor@project@org
Sat Apr 13 00:45:19 CEST 2019


On Fri, 12 Apr 2019 at 21:32, Travers Ching <traversc using gmail.com> wrote:
>
> Just throwing my two cents in:
>
> I think removing/deprecating fork would be a bad idea for two reasons:
>
> 1) There are no performant alternatives

"Performant"... in terms of what. If the cost of copying the data
predominates over the computation time, maybe you didn't need
parallelization in the first place.

> 2) Removing fork would break existing workflows

I don't see why mclapply could not be rewritten using PSOCK clusters.
And as a side effect, this would enable those workflows on Windows,
which doesn't support fork.

> Even if replaced with something using the same interface (e.g., a
> function that automatically detects variables to export as in the
> amazing `future` package), the lack of copy-on-write functionality
> would cause scripts everywhere to break.

To implement copy-on-write, Linux overcommits virtual memory, and this
is what causes scripts to break unexpectedly: everything works fine,
until you change a small unimportant bit and... boom, out of memory.
And in general, running forks in any GUI would cause things everywhere
to break.

> A simple example illustrating these two points:
> `x <- 5e8; mclapply(1:24, sum, x, 8)`
>
> Using fork, `mclapply` takes 5 seconds.  Using "psock", `clusterApply`
> does not complete.

I'm not sure how did you setup that, but it does complete. Or do you
mean that you ran out of memory? Then try replacing "x" with, e.g.,
"x+1" in your mclapply example and see what happens (hint: save your
work first).

--
Iñaki Úcar



More information about the R-devel mailing list