[R-sig-hpc] Rmpi working with OpenMPI and PBSPro but snow fails

luke at stat.uiowa.edu luke at stat.uiowa.edu
Wed Mar 4 14:49:52 CET 2009


On Wed, 4 Mar 2009, Huw Lynes wrote:

>
> Sorry for the long post but I'm having trouble getting snow to behave
> sensibly and was hoping someone else can spot where I'm going wrong.
>
> I've managed to get to the point where Rmpi seems to be working properly
> with PBSPro using OpenMPI. This is with Rmpi 0.5-7 and OpenMPI 1.3.
> OpenMPI is compiled against the task manager API from PBSPro and running
> a set of MPI test programs all seems well. Running the following R
> script:
>
> ------------------------------------------------------------------------
> #add the R MPI package if it is not already loaded.
> if  (!is.loaded("mpi_initialize")) {
>   library("Rmpi")
> }
> # In case R exits unexpectedly, have it automatically clean up
> # resources taken up by Rmpi (slaves, memory, etc...)
>
> .Last <- function(){
> if (is.loaded("mpi_initialize")){
>      if (mpi.comm.size(1) > 0){
>          print("Please use mpi.close.Rslaves() to close slaves.")
>          mpi.close.Rslaves()
>      }
>      print("Please use mpi.quit() to quit R")
>      .Call("mpi_finalize")
>  }
> }
>
> # Tell all slaves to return a message identifying themselves
> mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
> print("RANK")
> print(mpi.comm.rank())
>
> # Tell all slaves to close down, and exit the program
> mpi.close.Rslaves()
> mpi.quit()
> -----------------------------------------------------------------------
>
> as part of the following PBS script:
>
> ------------------------------------------------------------------------
> #!/bin/bash
> #PBS -q workq
> #PBS -l select=2:ncpus=4:mpiprocs=4
> #PBS -l place=scatter:excl
>
>
> module load apps/R
> module load libs/R-mpi
>
> cd $PBS_O_WORKDIR
> cat $PBS_NODEFILE
>
> mpiexec R --no-save -q -f mpi_test.r
> ------------------------------------------------------------------------
>
> gives the following output:
>
> -------------------------------------------------------------------------
> arccacluster178
> arccacluster178
> arccacluster178
> arccacluster178
> arccacluster180
> arccacluster180
> arccacluster180
> arccacluster180
> WARNING: ignoring environment value of R_HOME
> WARNING: ignoring environment value of R_HOME
> WARNING: ignoring environment value of R_HOME
> WARNING: ignoring environment value of R_HOME
> WARNING: ignoring environment value of R_HOME
> WARNING: ignoring environment value of R_HOME
> WARNING: ignoring environment value of R_HOME
> WARNING: ignoring environment value of R_HOME
> master (rank 0, comm 1) of size 8 is running on: arccacluster178
> slave1 (rank 1, comm 1) of size 8 is running on: arccacluster178
> slave2 (rank 2, comm 1) of size 8 is running on: arccacluster178
> slave3 (rank 3, comm 1) of size 8 is running on: arccacluster178
> slave4 (rank 4, comm 1) of size 8 is running on: arccacluster180
> slave5 (rank 5, comm 1) of size 8 is running on: arccacluster180
> slave6 (rank 6, comm 1) of size 8 is running on: arccacluster180
> slave7 (rank 7, comm 1) of size 8 is running on: arccacluster180
>> #add the R MPI package if it is not already loaded.
>> if  (!is.loaded("mpi_initialize")) {
> +    library("Rmpi")
> + }
>> # In case R exits unexpectedly, have it automatically clean up
>> # resources taken up by Rmpi (slaves, memory, etc...)
>>
>> .Last <- function(){
> + if (is.loaded("mpi_initialize")){
> +       if (mpi.comm.size(1) > 0){
> +           print("Please use mpi.close.Rslaves() to close slaves.")
> +           mpi.close.Rslaves()
> +       }
> +       print("Please use mpi.quit() to quit R")
> +       .Call("mpi_finalize")
> +   }
> + }
>>
>> # Tell all slaves to return a message identifying themselves
>> mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
> $slave1
> [1] "I am 1 of 8"
>
> $slave2
> [1] "I am 2 of 8"
>
> $slave3
> [1] "I am 3 of 8"
>
> $slave4
> [1] "I am 4 of 8"
>
> $slave5
> [1] "I am 5 of 8"
>
> $slave6
> [1] "I am 6 of 8"
>
> $slave7
> [1] "I am 7 of 8"
>
>> print("RANK")
> [1] "RANK"
>> print(mpi.comm.rank())
> [1] 0
>>
>> # Tell all slaves to close down, and exit the program
>> mpi.close.Rslaves()
> [1] 1
>> mpi.quit()
> -----------------------------------------------------------------------
>
> Which kind of makes sense. We are using the Rprofile that ships with
> Rmpi which does all the MPI setup for us. It seems that once that
> profile has run only the master process actually parses the R script
> which isn't what I was expecting from using MPI in other languages but I
> can see how it would be useful.
>
> Moving onto snow in the same environment trying to setup by using
> getMPICluster() returns an error in checkCluster() saying that there is
> something wrong with the cluster.

I don't know what "Moving to snow" means exactly as you don't give
details of you you are starting things up so I have to guess.  If you
are using mpiexec then you need to run snow via the RMPISNOW shell
script, which for NPROCS sets up a master and a cluster with NPROCS -
1 workers, and then use

cl <- makeCluster()

to access the already running cluster.

luke

>
> If I try to manually create a cluster by doing something like:
>
> npus <- Sys.getenv("NCPUS")
> cl <- makeCluster(ncpus,type="MPI")
>
> I get an error that the cluster already exists.
>
> Any thoughts on what I should be looking at to try and chase this down?
>
> Thanks,
> Huw
>
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-sig-hpc mailing list