[R-sig-hpc] multicore: when a core wanders off...
Vincenzo Carey
carey.vj at gmail.com
Mon Feb 28 22:35:24 CET 2011
I have been working with a system in which, most of the time,
a long-running mclapply will fail ostensibly because at least
one node has simply lost the child process assigned
to it. Other nodes succeed in writing some data;
stderr from the R process in which mclapply was invoked has no
interesting information, it seems R simply
dies. Any suggestions on how to get more diagnostic
information? gdb on the master R process doesn't seem relevant.
It seems feasible, using parallel/collect, to write a fault-tolerant
mclapply-like function that would attempt to fill-in list cells
that failed to populate in the expected time. Has anyone undertaken such?
More information about the R-sig-hpc
mailing list