[R-sig-hpc] stopCluster hangs instead of exits
Sajesh Singh
@@|ngh @end|ng |rom @mnh@org
Sat Nov 16 18:24:48 CET 2019
Bennet,
I have seen this issue before when using OpenMPI 2.x. After switching to OpenMPI 1.x I was able to run the StopCluster successfully.
-Sajesh-
-----Original Message-----
From: R-sig-hpc <r-sig-hpc-bounces using r-project.org> On Behalf Of Bennet Fauber
Sent: Saturday, November 16, 2019 12:00 PM
To: r-sig-hpc using r-project.org
Subject: [R-sig-hpc] stopCluster hangs instead of exits
EXTERNAL SENDER
We have a newish installation and are having some issues with
stopCluster() hanging when the cluster object is created using
cl <- makeMPIcluster(5)
from snow.
The base R is 3.6.1. The version of Rmpi is 0.6-9. The version of OpenMPI against which Rmpi was installed is 3.1.4.
The makeMPIcluster() seems to work, and processes are created. They look like this, for example,
bennet 26330 16163 0 11:07 pts/15 00:00:00 mpirun -np 1 Rmpi
--no-restore --no-save
bennet 26369 26330 99 11:07 pts/15 00:00:23
/sw/arcts/centos7/stacks/gcc/8.2.0/R/3.6.1/lib64/R/bin/exec/R --slave --no-restore --file=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1/snow/RMPInode.R
--args SNOWLIB=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1
OUT=/dev/null
bennet 26370 26330 99 11:07 pts/15 00:00:23
/sw/arcts/centos7/stacks/gcc/8.2.0/R/3.6.1/lib64/R/bin/exec/R --slave --no-restore --file=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1/snow/RMPInode.R
--args SNOWLIB=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1
OUT=/dev/null
bennet 26371 26330 99 11:07 pts/15 00:00:23
/sw/arcts/centos7/stacks/gcc/8.2.0/R/3.6.1/lib64/R/bin/exec/R --slave --no-restore --file=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1/snow/RMPInode.R
--args SNOWLIB=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1
OUT=/dev/null
bennet 26372 26330 99 11:07 pts/15 00:00:23
/sw/arcts/centos7/stacks/gcc/8.2.0/R/3.6.1/lib64/R/bin/exec/R --slave --no-restore --file=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1/snow/RMPInode.R
--args SNOWLIB=/sw/arcts/centos7/stacks/gcc/8.2.0/Rmpi/0.6-9/R-3.6.1
OUT=/dev/null
They seem able to do work and communicate OK. The only issue comes when stopCluster(cl) is called, at which point R hangs until it is interrupted by Ctrl-C, at which point it exits entirely.
The test program simply gathers the host name from each slave.
> library(Rmpi)
> library(parallel)
> library(snow)
Attaching package: ‘snow’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
parCapply, parLapply, parRapply, parSapply, splitIndices,
stopCluster
>
> cl <- makeCluster(4)
4 slaves are spawned successfully. 0 failed.
> clusterCall(cl, function() Sys.info()['nodename'])
[[1]]
nodename
"gl-build.arc-ts.umich.edu"
[[2]]
nodename
"gl-build.arc-ts.umich.edu"
[[3]]
nodename
"gl-build.arc-ts.umich.edu"
[[4]]
nodename
"gl-build.arc-ts.umich.edu"
> stopCluster(cl)
at which point intervention is required.
Any thoughts on what might be wrong and how I should go about fixing it?
Let me know if you need additional information, please.
Thank you, -- bennet
_______________________________________________
R-sig-hpc mailing list
R-sig-hpc using r-project.org
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-hpc&data=02%7C01%7Cssingh%40amnh.org%7C90b24d67c71c48d5ed5a08d76ab670a7%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637095204149556017&sdata=m16ol7ORyXN1beCdsjRlaWOGahPnhSQlt6t52UQFC1I%3D&reserved=0
More information about the R-sig-hpc
mailing list