[R-sig-hpc] Rmpi working with OpenMPI and PBSPro but snow fails

Huw Lynes lynesh at cardiff.ac.uk
Wed Mar 4 15:13:37 CET 2009


On Wed, 2009-03-04 at 07:49 -0600, luke at stat.uiowa.edu wrote:
> On Wed, 4 Mar 2009, Huw Lynes wrote:
> 
> >

Hi Luke,

Thanks for the quick response.

> > Moving onto snow in the same environment trying to setup by using
> > getMPICluster() returns an error in checkCluster() saying that there is
> > something wrong with the cluster.
> 
> I don't know what "Moving to snow" means exactly as you don't give
> details of you you are starting things up so I have to guess.  If you
> are using mpiexec then you need to run snow via the RMPISNOW shell
> script, which for NPROCS sets up a master and a cluster with NPROCS -
> 1 workers, and then use
> 
> cl <- makeCluster()
> 
> to access the already running cluster.
> 

If I take the following trivial R script:

------------------------------------------------------------------------

library(Rmpi)
library(snow)

cl <- makeCluster()
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
stopCluster(cl)
------------------------------------------------------------------------

and run it as 
------------------------------------------------------------------------
#!/bin/bash
#PBS -q SMP_queue
#PBS -l select=1:ncpus=4:mpiprocs=4
#PBS -l place=scatter:excl


module load apps/R
module load libs/R-mpi

cd $PBS_O_WORKDIR
cat $PBS_NODEFILE

mpiexec RMPISNOW -f snowtest_solo.r
-----------------------------------------------------------------------

all the R processes just sit there spinning rather than doing anything
useful and I have to kill the job.

the suggestion in this mail:
https://stat.ethz.ch/pipermail/r-sig-hpc/2009-January/000069.html

results in the same problem of R spinning. I suspect that there is
something different about my OpenMPI setup that means snow is failing to
set up a master process. So you end up with all four processes as slaves
spinning on a network poll.

-- 
Huw Lynes                       | Advanced Research Computing
HEC Sysadmin                    | Cardiff University
                                | Redwood Building, 
Tel: +44 (0) 29208 70626        | King Edward VII Avenue, CF10 3NB



More information about the R-sig-hpc mailing list