[R-sig-hpc] stopCluster hangs instead of exits
Bennet Fauber
bennet @end|ng |rom um|ch@edu
Sat Nov 16 18:18:16 CET 2019
I have a small test program that uses only Rmpi functions and performs
a similar task, and it runs cleanly and to completion.
Rmpi-test.R
--------------------------------------------------------
# Load the R MPI package if it is not already loaded.
if (!is.loaded("mpi_initialize")) {
library("Rmpi")
}
# Spawn N-1 workers
mpi.spawn.Rslaves(nslaves=mpi.universe.size()-1)
# The command we want to run on all the nodes/processors we have
mpi.remote.exec(paste("I am ", mpi.comm.rank(), " of ",
mpi.comm.size(), " on ",
Sys.info()
[c("nodename")]
)
)
# Stop the worker processes
mpi.close.Rslaves()
# Close down the MPI processes and quit R
mpi.quit()
--------------------------------------------------------
The MPI installation itself is a cluster installation, and many other
applications are using the MPI successfully, so I am pretty sure that
MPI is working.
The issue seems to be one of interaction between snow's stopCluster() and...?
Output from the above commands
> # Load the R MPI package if it is not already loaded.
> if (!is.loaded("mpi_initialize")) {
+ library("Rmpi")
+ }
> # Spawn N-1 workers
> paste(" There are ", mpi.universe.size(), " ranks in this universe")
[1] " There are 36 ranks in this universe"
> mpi.spawn.Rslaves(nslaves=3)
3 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 4 is running on: gl-build
slave1 (rank 1, comm 1) of size 4 is running on: gl-build
slave2 (rank 2, comm 1) of size 4 is running on: gl-build
slave3 (rank 3, comm 1) of size 4 is running on: gl-build
>
> # The command we want to run on all the nodes/processors we have
> mpi.remote.exec(paste("I am ", mpi.comm.rank(), " of ",
+ mpi.comm.size(), " on ",
+ Sys.info()
+ [c("nodename")]
+ )
+ )
$slave1
[1] "I am 1 of 4 on gl-build.arc-ts.umich.edu"
$slave2
[1] "I am 2 of 4 on gl-build.arc-ts.umich.edu"
$slave3
[1] "I am 3 of 4 on gl-build.arc-ts.umich.edu"
>
> # Stop the worker processes
> mpi.close.Rslaves()
[1] 1
>
> # Close down the MPI processes and quit R
> mpi.quit()
On Sat, Nov 16, 2019 at 12:11 PM Dirk Eddelbuettel <edd using debian.org> wrote:
>
>
> On 16 November 2019 at 11:59, Bennet Fauber wrote:
> | Any thoughts on what might be wrong and how I should go about fixing it?
>
> I would think that is an OpenMPI issue.
>
> My inclanation would be to try to replicate it with a pure C/C++ "hello MPI
> world" and check whether it returns cleanly or not when launched from
> `orterun` (or alike) with similar options.
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org
More information about the R-sig-hpc
mailing list