[R-sig-hpc] Rmpi working with OpenMPI and PBSPro but snow fails

Huw Lynes lynesh at cardiff.ac.uk
Wed Mar 4 14:36:33 CET 2009


Sorry for the long post but I'm having trouble getting snow to behave
sensibly and was hoping someone else can spot where I'm going wrong.

I've managed to get to the point where Rmpi seems to be working properly
with PBSPro using OpenMPI. This is with Rmpi 0.5-7 and OpenMPI 1.3.
OpenMPI is compiled against the task manager API from PBSPro and running
a set of MPI test programs all seems well. Running the following R
script:

------------------------------------------------------------------------
#add the R MPI package if it is not already loaded.
if  (!is.loaded("mpi_initialize")) {
   library("Rmpi")
}
# In case R exits unexpectedly, have it automatically clean up
# resources taken up by Rmpi (slaves, memory, etc...)

.Last <- function(){
if (is.loaded("mpi_initialize")){
      if (mpi.comm.size(1) > 0){
          print("Please use mpi.close.Rslaves() to close slaves.")
          mpi.close.Rslaves()
      }
      print("Please use mpi.quit() to quit R")
      .Call("mpi_finalize")
  }
}

# Tell all slaves to return a message identifying themselves
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
print("RANK")
print(mpi.comm.rank())

# Tell all slaves to close down, and exit the program
mpi.close.Rslaves()
mpi.quit()
-----------------------------------------------------------------------

as part of the following PBS script:

------------------------------------------------------------------------
#!/bin/bash
#PBS -q workq
#PBS -l select=2:ncpus=4:mpiprocs=4
#PBS -l place=scatter:excl


module load apps/R
module load libs/R-mpi

cd $PBS_O_WORKDIR
cat $PBS_NODEFILE

mpiexec R --no-save -q -f mpi_test.r
------------------------------------------------------------------------

gives the following output:

-------------------------------------------------------------------------
arccacluster178
arccacluster178
arccacluster178
arccacluster178
arccacluster180
arccacluster180
arccacluster180
arccacluster180
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME
master (rank 0, comm 1) of size 8 is running on: arccacluster178 
slave1 (rank 1, comm 1) of size 8 is running on: arccacluster178 
slave2 (rank 2, comm 1) of size 8 is running on: arccacluster178 
slave3 (rank 3, comm 1) of size 8 is running on: arccacluster178 
slave4 (rank 4, comm 1) of size 8 is running on: arccacluster180 
slave5 (rank 5, comm 1) of size 8 is running on: arccacluster180 
slave6 (rank 6, comm 1) of size 8 is running on: arccacluster180 
slave7 (rank 7, comm 1) of size 8 is running on: arccacluster180 
> #add the R MPI package if it is not already loaded.
> if  (!is.loaded("mpi_initialize")) {
+    library("Rmpi")
+ }
> # In case R exits unexpectedly, have it automatically clean up
> # resources taken up by Rmpi (slaves, memory, etc...)
> 
> .Last <- function(){
+ if (is.loaded("mpi_initialize")){
+       if (mpi.comm.size(1) > 0){
+           print("Please use mpi.close.Rslaves() to close slaves.")
+           mpi.close.Rslaves()
+       }
+       print("Please use mpi.quit() to quit R")
+       .Call("mpi_finalize")
+   }
+ }
> 
> # Tell all slaves to return a message identifying themselves
> mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
$slave1
[1] "I am 1 of 8"

$slave2
[1] "I am 2 of 8"

$slave3
[1] "I am 3 of 8"

$slave4
[1] "I am 4 of 8"

$slave5
[1] "I am 5 of 8"

$slave6
[1] "I am 6 of 8"

$slave7
[1] "I am 7 of 8"

> print("RANK")
[1] "RANK"
> print(mpi.comm.rank())
[1] 0
> 
> # Tell all slaves to close down, and exit the program
> mpi.close.Rslaves()
[1] 1
> mpi.quit()
-----------------------------------------------------------------------

Which kind of makes sense. We are using the Rprofile that ships with
Rmpi which does all the MPI setup for us. It seems that once that
profile has run only the master process actually parses the R script
which isn't what I was expecting from using MPI in other languages but I
can see how it would be useful.

Moving onto snow in the same environment trying to setup by using
getMPICluster() returns an error in checkCluster() saying that there is
something wrong with the cluster.

If I try to manually create a cluster by doing something like:

npus <- Sys.getenv("NCPUS")
cl <- makeCluster(ncpus,type="MPI")

I get an error that the cluster already exists.

Any thoughts on what I should be looking at to try and chase this down?

Thanks,
Huw

-- 
Huw Lynes                       | Advanced Research Computing
HEC Sysadmin                    | Cardiff University
                                | Redwood Building, 
Tel: +44 (0) 29208 70626        | King Edward VII Avenue, CF10 3NB



More information about the R-sig-hpc mailing list