[Rd] Is there a way to disable / warn about forking?
simon.urbanek at r-project.org
Tue Oct 4 19:00:27 CEST 2011
On Oct 4, 2011, at 4:43 AM, Thomas Friedrichsmeier wrote:
> Dear R developers,
> with the inclusion of the package "parallel" in the upcoming release of R,
> users and package developers are likely to make increasing usage of
> parallelization features. In part, these features rely on forking the R
> process. As ?mcfork points out, fork()ing in a GUI process is typically a bad
> idea. In RKWard, we "only" seem to have problems with signals arriving in the
> wrong threads, and occasional failure to collect the results from child
> processes. I haven't entirely given up the hope to fix this, eventually, but in
> consequence, parallelization based on forking is not currently usable inside
> an RKWard session.
> I am somewhat worried that, as library(parallel) gains acceptance,
> unsuspecting users will increasingly start to run into forking related
> problems in RKWard and other environments.
I don't see why this should be anything new - this is already happening since both packages that were folded into parallel (snow and multicore) are well known and well used.
In multicore we were explicitly warning about this and also working around issues where possible (e.g. the Mac GUI, for example). Judging by the widespread use of multicore and the absence of problem reports related to GUIs, my impression would be that this aspect is not really a problem (more below). We get more users confused about the inability to perform side-effects than this, for example.
In general, there are two main issues that can be addressed by the GUI:
a) shared file descriptors. This is a problem if the GUI uses FDs for communication and they are not closed in the child instance. You don't want both the child and the parent to process those FDs. E.g., closeAll() can be used to work around that issue and with parallel there could be an easier interface for this given that it's in core R.
b) event loop. If the GUI hooks into the event loop then, obviously, this is only intended to be run from the master. multicore was already disabling the even loop hook for AQUA, but it was hard to provide a more comprehensive solution since it needed cooperation of R. In parallel it's much easier, because it can modify R to allow the event loop conditionally and thus only in the master process.
The whole point of parallel is that it can do more than an external package, so I think you're going about it the wrong way - you should be talking to us much earlier so whatever your constraints in RKWard can be possibly addressed by the infrastructure. Also note that a lot of this should be seamless, a lot of users don't care what the infrastructure is, they just want their task to run in parallel, they don't care about mcfork() and the like - the choices will be made for them, because there is no fork on Windows, for example.
> Therefore, I wish:
> - The warning from ?mcfork about potential complications should also be
> visible on the documentation pages for the higher level functions
> mcparallel(), mclapply(), but also makeForkCluster().
> - It would be nice to have a way to tell library(parallel) that forking is a
> bad idea in the current session, so that
> - mcfork() could stop with an informative error message, or at least produce
> a warning; mclapply() could fall back to mc.cores=1 with a warning.
> - third party packages which wish to use parallelization could check whether
> it is safe to use forking, or whether another mechanism should be used.
> I am aware that options(mc.cores=1) will effectively disable forking in
> mclapply(). However, this would make it look like (local) parallelization is
> not worth while at all, while actually, parallelization with
> makePSOCKCluster() works just fine. So, I'm looking for a way to selectively
> disable the use of forking.
More information about the R-devel