[R-sig-hpc] Running Rmpi/OpenMPI issues
Tsai Li Ming
mailinglist at ltsai.com
Wed Mar 26 04:32:45 CET 2014
On 26 Mar, 2014, at 11:14 am, Ross Boylan <ross at biostat.ucsf.edu> wrote:
> On Sat, 2014-03-22 at 09:51 +0800, Tsai Li Ming wrote:
>> Hi,
>>
>> I have R 3.0.3 and OpenMPI 1.6.5.
>> Snow: 0.3-13
>> Rmpi: 0.6-3
>>
>> Here’s my test script:
>> library(snow)
>>
>> nbNodes <- 4
>> cl <- makeCluster(nbNodes, "MPI")
>> clusterCall(cl, function() Sys.info()[c("nodename","machine")])
>> mpi.quit()
>>
>> And the mpirun command:
>> /opt/openmpi-1.6.5-intel/bin/mpirun -np 1 -H host1,host2,host3,host4 R --no-save < ~/test_mpi.R
>
> Maybe this will help; my script to launch Rmpi is (originally all 1
> line):
> R_PROFILE_USER=~/KHC/sunbelt/Rmpiprofile
> LD_LIBRARY_PATH=/home/ross/install/lib:$LD_LIBRARY_PATH
> PATH=/home/ross/install/bin:/home/ross/install/lib64/R/bin:$PATH orterun
> -x R_PROFILE_USER -x LD_LIBRARY_PATH -x PATH -hostfile
> ~/KHC/sunbelt/hosts --prefix /home/ross/install R --no-save -q
>
> Observations:
> 1. If mpirun is not on the regular path, one must use --prefix to tell
> it where to look. Otherwise MPI won't find the program and won't be
> able to launch remotely.
>
> 2. For running within those remote sessions you may need to set PATH and
> LD_LIBRARY_PATH so stuff gets found.
>
> 3. I left out -np; when I used it I always set it to the actual number
> of processes (my hosts file looks like host1 slots=4). I thought np 1
> would limit you to one process; evidently it doesn't.
>
> 4. Rmpi, and possibly snow, requires a special startup script that is
> distributed with the package. I used a modified version and set
> R_PROFILE_USER and exported that variable with -x.
>
> Ross Boylan
Thanks Ross,
I managed to get it up running by copying the Rprofile from the Rmpi package into ~/.Rprofile and by calling:
$ mpirun -np4 -H host1,host2,host3,host4 R —-no-save < ~/test_rmpi.R
Here’s my R script:
library(Rmpi)
library(boot)
mpi.remote.exec(mpi.get.processor.name())
mpi.close.Rslaves()
mpi.quit()
But I didn’t try with Snow.
>
>>
>> Here’s the output:
>>> cl <- makeCluster(nbNodes, "MPI")
>> Loading required package: Rmpi
>> 4 slaves are spawned successfully. 0 failed.
>>> clusterCall(cl, function() Sys.info()[c("nodename","machine")])
>> [[1]]
>> nodename machine
>> “host1" "x86_64"
>>
>> [[2]]
>> nodename machine
>> “host1" "x86_64"
>>
>> [[3]]
>> nodename machine
>> “host1" "x86_64"
>>
>> [[4]]
>> nodename machine
>> “host1" "x86_64"
>>
>>>
>>> mpi.quit()
>>
>> I followed the instructions from:
>> http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf
>> , specifically to use -np 1
>>
>> 1. Why is it not running on the rest of the nodes? I can see all 4 processes on host1 and no orted daemon running.
>>
>> What should be the correct way to run this?
>>
>> I have also tested a working CPI using openmpi and is working.
>>
>> 2. mpi.quit() just hangs there.
>>
>> =================
>>
>> I have tried a rmpi example:
>> library(Rmpi)
>> rk <- mpi.comm.rank(0)
>> sz <- mpi.comm.size(0)
>> name <- mpi.get.processor.name()
>> cat("Hello, rank", rk, "size", sz, "on", name, "\n")
>> mpi.quit()
>>
>> $ /opt/openmpi-1.6.5-intel/bin/mpirun -np 4 -H host1,host2,host3,host4 R --no-save < ~/test_rmpi.r
>>
>> It hangs here:
>>> library(Rmpi) # calls MPI_Init
>>
>> 1. Running with -np 2, hangs at the library(Rmpi), similar to -np 4
>>
>> 2. Running with -np 1, I can get a successful run
>>
>> 3. Running with -np 8 , I get an error:
>>> library(Rmpi) # calls MPI_Init
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 4 with PID 38992 on
>> node numaq1.1dn exiting improperly. There are two reasons this could occur:
>>
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can cause a job to hang indefinitely while it waits
>> for all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>>
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>>
>> This may have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>>
>>
>> Thanks!
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
>
More information about the R-sig-hpc
mailing list