[Rd] Force quitting a FORK cluster node on macOS and Solaris wreaks havoc

Simon Urbanek @|mon@urb@nek @end|ng |rom R-project@org
Fri Aug 13 00:58:01 CEST 2021


Henrik,

I'm not quite sure I understand the report to be honest.

Just a quick comment here - using quit() in a forked child is not allowed, because the R clean-up is only intended for the master as it will be blowing away the master's state, connections, working directory, running master's exit handlers etc. That's why the children have to use either abort or mcexit() to terminate - which is what mcparallel() does. If you use q() a lot of things go wrong no matter the platform - e.g. try using ? in the master session after sourcing your code.

Cheers,
Simon


> On 12/08/2021, at 8:22 PM, Henrik Bengtsson <henrik.bengtsson using gmail.com> wrote:
> 
> The following smells like a bug in R to me, because it puts the main R
> session into an unstable state.  Consider the following R script:
> 
> a <- 42
> message("a=", a)
> cl <- parallel::makeCluster(1L, type="FORK")
> try(parallel::clusterEvalQ(cl, quit(save="no")))
> message("parallel:::isChild()=", parallel:::isChild())
> message("a=", a)
> rm(a)
> 
> The purpose of this was to emulate what happens when an parallel
> workers crashes.
> 
> Now, if you source() the above on macOS, you might(*) end up with:
> 
>> a <- 42
>> message("a=", a)
> a=42
>> cl <- parallel::makeCluster(1L, type="FORK")
>> try(parallel::clusterEvalQ(cl, quit(save="no")))
> Error: Error in unserialize(node$con) : error reading from connection
>> message("parallel:::isChild()=", parallel:::isChild())
> parallel:::isChild()=FALSE
>> message("a=", a)
> a=42
>> rm(a)
>> try(parallel::clusterEvalQ(cl, quit(save="no")))
> Error: Error in unserialize(node$con) : error reading from connection
>> message("parallel:::isChild()=", parallel:::isChild())
> parallel:::isChild()=FALSE
>> message("a=", a)
> Error: Error in message("a=", a) : object 'a' not found
> Execution halted
> 
> Note how 'rm(a)' is supposed to be the last line of code to be
> evaluated.  However, the force quitting of the FORK cluster node
> appears to result in the main code being evaluated twice (in
> parallel?).
> 
> (*) This does not happen on all macOS variants. For example, it works
> fine on CRAN's 'r-release-macos-x86_64' but it does give the above
> behavior on 'r-release-macos-arm64'.  I can reproduce it on GitHub
> Actions (https://github.com/HenrikBengtsson/teeny/runs/3309235106?check_suite_focus=true#step:10:219)
> but not on R-hub's 'macos-highsierra-release' and
> 'macos-highsierra-release-cran'.  I can also reproduce it on R-hub's
> 'solaris-x86-patched' and solaris-x86-patched-ods' machines.  However,
> I still haven't found a Linux machine where this happens.
> 
> If one replaces quit(save="no") with tools::pskill(Sys.getpid()) or
> parallel:::mcexit(0L), this behavior does not take place (at least not
> on GitHub Actions and R-hub).
> 
> I don't have access to a macOS or a Solaris machine, so I cannot
> investigate further myself. For example, could it be an issue with
> quit(), or does is it possible to trigger by other means? And more
> importantly, should this be fixed? Also, I'd be curious what happens
> if you run the above in an interactive R session.
> 
> /Henrik
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list