[R-sig-hpc] stopCluster hangs instead of exits

Sajesh Singh @@|ngh @end|ng |rom @mnh@org
Sat Nov 16 18:24:48 CET 2019


Bennet,
  I have seen this issue before when using OpenMPI 2.x. After switching to OpenMPI 1.x I was able to run the StopCluster successfully.


-Sajesh-

-----Original Message-----
From: R-sig-hpc <r-sig-hpc-bounces using r-project.org> On Behalf Of Bennet Fauber
Sent: Saturday, November 16, 2019 12:00 PM
To: r-sig-hpc using r-project.org
Subject: [R-sig-hpc] stopCluster hangs instead of exits

EXTERNAL SENDER


We have a newish installation and are having some issues with
stopCluster() hanging when the cluster object is created using

    cl <- makeMPIcluster(5)

from snow.

The base R is 3.6.1.  The version of Rmpi is 0.6-9.  The version of OpenMPI against which Rmpi was installed is 3.1.4.

The makeMPIcluster() seems to work, and processes are created.  They look like this, for example,

bennet    26330  16163  0 11:07 pts/15   00:00:00 mpirun -np 1 Rmpi
--no-restore --no-save

bennet    26369  26330 99 11:07 pts/15   00:00:23
/sw/arcts/centos7/stacks/gcc/8.2.0/R/3.6.1/lib64/R/bin/exec/R --slave --no-restore --file=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1/snow/RMPInode.R
--args SNOWLIB=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1
OUT=/dev/null

bennet    26370  26330 99 11:07 pts/15   00:00:23
/sw/arcts/centos7/stacks/gcc/8.2.0/R/3.6.1/lib64/R/bin/exec/R --slave --no-restore --file=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1/snow/RMPInode.R
--args SNOWLIB=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1
OUT=/dev/null

bennet    26371  26330 99 11:07 pts/15   00:00:23
/sw/arcts/centos7/stacks/gcc/8.2.0/R/3.6.1/lib64/R/bin/exec/R --slave --no-restore --file=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1/snow/RMPInode.R
--args SNOWLIB=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1
OUT=/dev/null

bennet    26372  26330 99 11:07 pts/15   00:00:23
/sw/arcts/centos7/stacks/gcc/8.2.0/R/3.6.1/lib64/R/bin/exec/R --slave --no-restore --file=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1/snow/RMPInode.R
--args SNOWLIB=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1
OUT=/dev/null

They seem able to do work and communicate OK.  The only issue comes when stopCluster(cl) is called, at which point R hangs until it is interrupted by Ctrl-C, at which point it exits entirely.

The test program simply gathers the host name from each slave.

> library(Rmpi)
> library(parallel)
> library(snow)

Attaching package: ‘snow’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
    parCapply, parLapply, parRapply, parSapply, splitIndices,
    stopCluster

>
> cl <- makeCluster(4)
    4 slaves are spawned successfully. 0 failed.
> clusterCall(cl, function() Sys.info()['nodename'])
[[1]]
                   nodename
"gl-build.arc-ts.umich.edu"

[[2]]
                   nodename
"gl-build.arc-ts.umich.edu"

[[3]]
                   nodename
"gl-build.arc-ts.umich.edu"

[[4]]
                   nodename
"gl-build.arc-ts.umich.edu"

> stopCluster(cl)

at which point intervention is required.

Any thoughts on what might be wrong and how I should go about fixing it?

Let me know if you need additional information, please.

Thank you,    -- bennet

_______________________________________________
R-sig-hpc mailing list
R-sig-hpc using r-project.org
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-hpc&data=02%7C01%7Cssingh%40amnh.org%7C90b24d67c71c48d5ed5a08d76ab670a7%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637095204149556017&sdata=m16ol7ORyXN1beCdsjRlaWOGahPnhSQlt6t52UQFC1I%3D&reserved=0


More information about the R-sig-hpc mailing list