[Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

Travers Ching tr@ver@c @end|ng |rom gm@||@com
Fri Apr 12 21:31:24 CEST 2019


Just throwing my two cents in:

I think removing/deprecating fork would be a bad idea for two reasons:

1) There are no performant alternatives
2) Removing fork would break existing workflows

Even if replaced with something using the same interface (e.g., a
function that automatically detects variables to export as in the
amazing `future` package), the lack of copy-on-write functionality
would cause scripts everywhere to break.

A simple example illustrating these two points:
`x <- 5e8; mclapply(1:24, sum, x, 8)`

Using fork, `mclapply` takes 5 seconds.  Using "psock", `clusterApply`
does not complete.

Travers

On Fri, Apr 12, 2019 at 2:32 AM Iñaki Ucar <iucar using fedoraproject.org> wrote:
>
> On Thu, 11 Apr 2019 at 22:07, Henrik Bengtsson
> <henrik.bengtsson using gmail.com> wrote:
> >
> > ISSUE:
> > Using *forks* for parallel processing in R is not always safe.
> > [...]
> > Comments?
>
> Using fork() is never safe. The reference provided by Kevin [1] is
> pretty compelling (I kindly encourage anyone who ever forked a process
> to read it). Therefore, I'd go beyond Henrik's suggestion, and I'd
> advocate for deprecating fork clusters and eventually removing them
> from parallel.
>
> [1] https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf
>
> --
> Iñaki Úcar
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list