[R-sig-hpc] Running Rmpi/OpenMPI issues

Tsai Li Ming mailinglist at ltsai.com
Wed Mar 26 04:32:45 CET 2014


On 26 Mar, 2014, at 11:14 am, Ross Boylan <ross at biostat.ucsf.edu> wrote:

> On Sat, 2014-03-22 at 09:51 +0800, Tsai Li Ming wrote:
>> Hi,
>> 
>> I have R 3.0.3 and OpenMPI 1.6.5.
>> Snow: 0.3-13
>> Rmpi: 0.6-3
>> 
>> Here’s my test script:
>> library(snow)
>> 
>> nbNodes <- 4
>> cl <- makeCluster(nbNodes, "MPI")
>> clusterCall(cl, function() Sys.info()[c("nodename","machine")])
>> mpi.quit()
>> 
>> And the mpirun command:
>> /opt/openmpi-1.6.5-intel/bin/mpirun -np 1 -H host1,host2,host3,host4 R --no-save < ~/test_mpi.R
> 
> Maybe this will help; my script to launch Rmpi is (originally all 1
> line):
> R_PROFILE_USER=~/KHC/sunbelt/Rmpiprofile
> LD_LIBRARY_PATH=/home/ross/install/lib:$LD_LIBRARY_PATH
> PATH=/home/ross/install/bin:/home/ross/install/lib64/R/bin:$PATH orterun
> -x R_PROFILE_USER -x LD_LIBRARY_PATH -x PATH -hostfile
> ~/KHC/sunbelt/hosts --prefix /home/ross/install R --no-save -q
> 
> Observations:
> 1. If mpirun is not on the regular path, one must use --prefix to tell
> it where to look.  Otherwise MPI won't find the program and won't be
> able to launch remotely.
> 
> 2. For running within those remote sessions you may need to set PATH and
> LD_LIBRARY_PATH so stuff gets found.
> 
> 3. I left out -np; when I used it I always set it to the actual number
> of processes (my hosts file looks like host1 slots=4).  I thought np 1
> would limit you to one process; evidently it doesn't.
> 
> 4. Rmpi, and possibly snow, requires a special startup script that is
> distributed with the package.  I used a modified version and set
> R_PROFILE_USER and exported that variable with -x.
> 
> Ross Boylan

Thanks Ross,

I managed to get it up running by copying the Rprofile from the Rmpi package into ~/.Rprofile and by calling:
$ mpirun -np4 -H host1,host2,host3,host4 R —-no-save < ~/test_rmpi.R

Here’s my R script:
library(Rmpi)
library(boot)

mpi.remote.exec(mpi.get.processor.name())
mpi.close.Rslaves()
mpi.quit()

But I didn’t try with Snow. 



> 
>> 
>> Here’s the output:
>>> cl <- makeCluster(nbNodes, "MPI")
>> Loading required package: Rmpi
>> 	4 slaves are spawned successfully. 0 failed.
>>> clusterCall(cl, function() Sys.info()[c("nodename","machine")])
>> [[1]]
>>   nodename      machine 
>> “host1"     "x86_64" 
>> 
>> [[2]]
>>   nodename      machine 
>> “host1"     "x86_64" 
>> 
>> [[3]]
>>   nodename      machine 
>> “host1"     "x86_64" 
>> 
>> [[4]]
>>   nodename      machine 
>> “host1"     "x86_64" 
>> 
>>> 
>>> mpi.quit()
>> 
>> I followed the instructions from:
>> http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf
>> , specifically to use -np 1
>> 
>> 1. Why is it not running on the rest of the nodes? I can see all 4 processes on host1 and no orted daemon running.
>> 
>> What should be the correct way to run this? 
>> 
>> I have also tested a working CPI using openmpi and is working.
>> 
>> 2. mpi.quit() just hangs there.
>> 
>> =================
>> 
>> I have tried a rmpi example:
>> library(Rmpi) 
>> rk <- mpi.comm.rank(0)
>> sz <- mpi.comm.size(0)
>> name <- mpi.get.processor.name()
>> cat("Hello, rank", rk, "size", sz, "on", name, "\n")
>> mpi.quit()
>> 
>> $ /opt/openmpi-1.6.5-intel/bin/mpirun -np 4 -H host1,host2,host3,host4 R --no-save < ~/test_rmpi.r 
>> 
>> It hangs here:
>>> library(Rmpi) # calls MPI_Init
>> 
>> 1. Running with -np 2, hangs at the library(Rmpi), similar to -np 4
>> 
>> 2. Running with -np 1, I can get a successful run
>> 
>> 3. Running with -np 8 , I get an error:
>>> library(Rmpi) # calls MPI_Init
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 4 with PID 38992 on
>> node numaq1.1dn exiting improperly. There are two reasons this could occur:
>> 
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can cause a job to hang indefinitely while it waits
>> for all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>> 
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>> 
>> This may have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>> 
>> 
>> Thanks!
>> 
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
> 



More information about the R-sig-hpc mailing list