[R-sig-hpc] multicore: when a core wanders off...

Vincenzo Carey carey.vj at gmail.com
Mon Feb 28 22:35:24 CET 2011


I have been working with a system in which, most of the time,
a long-running mclapply will fail ostensibly because at least
one node has simply lost the child process assigned
to it.  Other nodes succeed in writing some data;
stderr from the R process in which mclapply was invoked has no
interesting information, it seems R simply
dies.  Any suggestions on how to get more diagnostic
information? gdb on the master R process doesn't seem relevant.

It seems feasible, using parallel/collect, to write a fault-tolerant
mclapply-like function that would attempt to fill-in list cells
that failed to populate in the expected time.  Has anyone undertaken such?



More information about the R-sig-hpc mailing list