[R-sig-hpc] Rmpi: mpi.close.Rslaves() 'hangs'

Ei-ji Nakama nakama at ki.rim.or.jp
Thu Sep 28 06:55:59 CEST 2017


Hi,

using openmpi-2.x same problem occurs on Linux.
# There is no problem with openmpi 1.6 and openmpi 1.10

$ orte-ps
...
$ echo "bt" | gdb -p <PID>
Looping in MPI_Comm_disconnect...

$ mkdir -p ~/.openmpi ; echo pmix_base_verbose=100 >> ~/.openmpi/mca-params.conf
Debug information can be obtained by setting the above and executing
the script...
<<snip : debug result is long>>
l was look it up ((but a little))

When PMIX is used, the value is set to the following environment variable.
> grep("^PMIX",names(Sys.getenv()),value=TRUE)
[1] "PMIX_DEBUG"         "PMIX_NAMESPACE"     "PMIX_RANK"
[4] "PMIX_SECURITY_MODE" "PMIX_SERVER_URI"

Well, as an alternative, there is MPI_Comm_free, so if using PMIX it
seems to be better to change to use MPI_Comm_Free without using
MPI_Comm_disconnect.

diff -ruN Rmpi.orig/R/Rparutilities.R Rmpi/R/Rparutilities.R
--- Rmpi.orig/R/Rparutilities.R    2016-05-31 23:12:53.000
000000 +0900
+++ Rmpi/R/Rparutilities.R    2017-09-28 12:41:50.545396494 +0900
@@ -332,8 +332,12 @@
     }
 #     mpi.barrier(comm)
     if (comm >0){
-        if (is.loaded("mpi_comm_disconnect"))
-            mpi.comm.disconnect(comm)
+        if (is.loaded("mpi_comm_disconnect")){
+            if (Sys.getenv("PMIX_NAMESPACE")=="")
+                mpi.comm.disconnect(comm)
+            else
+                mpi.comm.free(comm)
+        }
         else
             mpi.comm.free(comm)
     }
diff -ruN Rmpi.orig/inst/Rslaves.sh Rmpi/inst/Rslaves.sh
--- Rmpi.orig/inst/Rslaves.sh    2012-09-05 01:17:59.000000000 +0900
+++ Rmpi/inst/Rslaves.sh    2017-09-27 15:07:05.205719837 +0900
@@ -14,7 +14,7 @@

 if  [ "$3" = "needlog" ]; then
     hn=`hostname -s`
-    $R_HOME/bin/R --no-init-file --slave --no-save -f  $1 > $hn.$2.$$.log 2>&1
+    exec $R_HOME/bin/R --no-init-file --slave --no-save -f  $1 >
$hn.$2.$$.log 2>&1
 else
-    $R_HOME/bin/R --no-init-file --slave --no-save -f  $1 > /dev/null 2>&1
+    exec $R_HOME/bin/R --no-init-file --slave --no-save -f  $1 > /dev/null 2>&1
 fi
diff -ruN Rmpi.orig/inst/slavedaemon.R Rmpi/inst/slavedaemon.R
--- Rmpi.orig/inst/slavedaemon.R    2013-02-23 13:07:54.000000000 +0900
+++ Rmpi/inst/slavedaemon.R    2017-09-28 11:45:19.598288064 +0900
@@ -16,6 +16,9 @@
 repeat
     try(eval(mpi.bcast.cmd(rank=0,comm=.comm, nonblock=.nonblock,
sleep=.sleep),envir=.GlobalEnv),TRUE)
 print("Done")
-invisible(mpi.comm.disconnect(.comm))
+if(Sys.getenv("PMIX_NAMESPACE")=="")
+    invisible(mpi.comm.disconnect(.comm))
+else
+    invisible(mpi.comm.free(.comm))
 invisible(mpi.comm.set.errhandler(0))
 mpi.quit()

Best Regards,
--



More information about the R-sig-hpc mailing list