[R-sig-hpc] Running Rmpi/OpenMPI issues

Tsai Li Ming mailinglist at ltsai.com
Sat Mar 22 02:51:40 CET 2014


Hi,

I have R 3.0.3 and OpenMPI 1.6.5.
Snow: 0.3-13
Rmpi: 0.6-3

Here’s my test script:
library(snow)

nbNodes <- 4
cl <- makeCluster(nbNodes, "MPI")
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
mpi.quit()

And the mpirun command:
/opt/openmpi-1.6.5-intel/bin/mpirun -np 1 -H host1,host2,host3,host4 R --no-save < ~/test_mpi.R

Here’s the output:
> cl <- makeCluster(nbNodes, "MPI")
Loading required package: Rmpi
	4 slaves are spawned successfully. 0 failed.
> clusterCall(cl, function() Sys.info()[c("nodename","machine")])
[[1]]
   nodename      machine 
“host1"     "x86_64" 

[[2]]
   nodename      machine 
“host1"     "x86_64" 

[[3]]
   nodename      machine 
“host1"     "x86_64" 

[[4]]
   nodename      machine 
“host1"     "x86_64" 

> 
> mpi.quit()

I followed the instructions from:
http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf
, specifically to use -np 1

1. Why is it not running on the rest of the nodes? I can see all 4 processes on host1 and no orted daemon running.

What should be the correct way to run this? 

I have also tested a working CPI using openmpi and is working.

2. mpi.quit() just hangs there.

=================

I have tried a rmpi example:
library(Rmpi) 
rk <- mpi.comm.rank(0)
sz <- mpi.comm.size(0)
name <- mpi.get.processor.name()
cat("Hello, rank", rk, "size", sz, "on", name, "\n")
mpi.quit()

$ /opt/openmpi-1.6.5-intel/bin/mpirun -np 4 -H host1,host2,host3,host4 R --no-save < ~/test_rmpi.r 

It hangs here:
> library(Rmpi) # calls MPI_Init

1. Running with -np 2, hangs at the library(Rmpi), similar to -np 4

2. Running with -np 1, I can get a successful run

3. Running with -np 8 , I get an error:
> library(Rmpi) # calls MPI_Init
--------------------------------------------------------------------------
mpirun has exited due to process rank 4 with PID 38992 on
node numaq1.1dn exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------


Thanks!



More information about the R-sig-hpc mailing list