[R-sig-hpc] Rmpi::mpi.spawn.Rslaves() stalls

Cédric Lachat cedric@l@ch@t @ending from u-borde@ux@fr
Thu Jul 5 11:43:58 CEST 2018


Hello,

I have the same issue.

I launch:
gdb Rscript

I give as argument my R script.
Then, I have this output, and program stalls:
...
Be patient, lcmm is running ...
    4 slaves are spawned successfully. 0 failed.

I kill all R:
killall R

On gdb, I have this backtrace:
#0  0x00007ffff733974d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffff1438e58 in ?? () from /usr/lib/libopen-pal.so.13
#2  0x00007ffff142f6fb in opal_libevent2021_event_base_loop () from
/usr/lib/libopen-pal.so.13
#3  0x00007ffff13f9238 in opal_progress () from /usr/lib/libopen-pal.so.13
#4  0x00007ffff1b3df65 in ompi_request_default_wait_all () from
/usr/lib/libmpi.so.12
#5  0x00007ffff1b801fb in ompi_dpm_base_disconnect_waitall () from
/usr/lib/libmpi.so.12
#6  0x00007fffec44544a in ?? () from
/usr/lib/openmpi/lib/openmpi/mca_dpm_orte.so
#7  0x00007ffff1b52ed0 in PMPI_Comm_disconnect () from /usr/lib/libmpi.so.12
#8  0x00007ffff1dda969 in mpi_comm_disconnect (sexp_comm=<optimized
out>) at Rmpi.c:1078
#9  0x00007ffff78f5d90 in ?? () from /usr/lib/R/lib/libR.so
#10 0x00007ffff792c4ff in Rf_eval () from /usr/lib/R/lib/libR.so
...

My program versions are:
Ubuntu 16.04
OpenMPI 1.10.2
R 3.2.3

However, on CentOS 7, with OpenMPI 3.1.0 and R 3.5.0, it works!

So, does the issue stem from MPI or Rmpi?

Regards,
Cédric.



More information about the R-sig-hpc mailing list