[R-sig-hpc] Running Rmpi/OpenMPI issues

Ross Boylan ross at biostat.ucsf.edu
Wed Mar 26 04:14:36 CET 2014


On Sat, 2014-03-22 at 09:51 +0800, Tsai Li Ming wrote:
> Hi,
> 
> I have R 3.0.3 and OpenMPI 1.6.5.
> Snow: 0.3-13
> Rmpi: 0.6-3
> 
> Here’s my test script:
> library(snow)
> 
> nbNodes <- 4
> cl <- makeCluster(nbNodes, "MPI")
> clusterCall(cl, function() Sys.info()[c("nodename","machine")])
> mpi.quit()
> 
> And the mpirun command:
> /opt/openmpi-1.6.5-intel/bin/mpirun -np 1 -H host1,host2,host3,host4 R --no-save < ~/test_mpi.R

Maybe this will help; my script to launch Rmpi is (originally all 1
line):
R_PROFILE_USER=~/KHC/sunbelt/Rmpiprofile
LD_LIBRARY_PATH=/home/ross/install/lib:$LD_LIBRARY_PATH
PATH=/home/ross/install/bin:/home/ross/install/lib64/R/bin:$PATH orterun
-x R_PROFILE_USER -x LD_LIBRARY_PATH -x PATH -hostfile
~/KHC/sunbelt/hosts --prefix /home/ross/install R --no-save -q

Observations:
1. If mpirun is not on the regular path, one must use --prefix to tell
it where to look.  Otherwise MPI won't find the program and won't be
able to launch remotely.

2. For running within those remote sessions you may need to set PATH and
LD_LIBRARY_PATH so stuff gets found.

3. I left out -np; when I used it I always set it to the actual number
of processes (my hosts file looks like host1 slots=4).  I thought np 1
would limit you to one process; evidently it doesn't.

4. Rmpi, and possibly snow, requires a special startup script that is
distributed with the package.  I used a modified version and set
R_PROFILE_USER and exported that variable with -x.

Ross Boylan

> 
> Here’s the output:
> > cl <- makeCluster(nbNodes, "MPI")
> Loading required package: Rmpi
> 	4 slaves are spawned successfully. 0 failed.
> > clusterCall(cl, function() Sys.info()[c("nodename","machine")])
> [[1]]
>    nodename      machine 
> “host1"     "x86_64" 
> 
> [[2]]
>    nodename      machine 
> “host1"     "x86_64" 
> 
> [[3]]
>    nodename      machine 
> “host1"     "x86_64" 
> 
> [[4]]
>    nodename      machine 
> “host1"     "x86_64" 
> 
> > 
> > mpi.quit()
> 
> I followed the instructions from:
> http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf
> , specifically to use -np 1
> 
> 1. Why is it not running on the rest of the nodes? I can see all 4 processes on host1 and no orted daemon running.
> 
> What should be the correct way to run this? 
> 
> I have also tested a working CPI using openmpi and is working.
> 
> 2. mpi.quit() just hangs there.
> 
> =================
> 
> I have tried a rmpi example:
> library(Rmpi) 
> rk <- mpi.comm.rank(0)
> sz <- mpi.comm.size(0)
> name <- mpi.get.processor.name()
> cat("Hello, rank", rk, "size", sz, "on", name, "\n")
> mpi.quit()
> 
> $ /opt/openmpi-1.6.5-intel/bin/mpirun -np 4 -H host1,host2,host3,host4 R --no-save < ~/test_rmpi.r 
> 
> It hangs here:
> > library(Rmpi) # calls MPI_Init
> 
> 1. Running with -np 2, hangs at the library(Rmpi), similar to -np 4
> 
> 2. Running with -np 1, I can get a successful run
> 
> 3. Running with -np 8 , I get an error:
> > library(Rmpi) # calls MPI_Init
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 4 with PID 38992 on
> node numaq1.1dn exiting improperly. There are two reasons this could occur:
> 
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
> 
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> 
> 
> Thanks!
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc



More information about the R-sig-hpc mailing list