[R-sig-hpc] Rmpi working with OpenMPI and PBSPro but snow fails
Huw Lynes
lynesh at cardiff.ac.uk
Wed Mar 4 15:13:37 CET 2009
On Wed, 2009-03-04 at 07:49 -0600, luke at stat.uiowa.edu wrote:
> On Wed, 4 Mar 2009, Huw Lynes wrote:
>
> >
Hi Luke,
Thanks for the quick response.
> > Moving onto snow in the same environment trying to setup by using
> > getMPICluster() returns an error in checkCluster() saying that there is
> > something wrong with the cluster.
>
> I don't know what "Moving to snow" means exactly as you don't give
> details of you you are starting things up so I have to guess. If you
> are using mpiexec then you need to run snow via the RMPISNOW shell
> script, which for NPROCS sets up a master and a cluster with NPROCS -
> 1 workers, and then use
>
> cl <- makeCluster()
>
> to access the already running cluster.
>
If I take the following trivial R script:
------------------------------------------------------------------------
library(Rmpi)
library(snow)
cl <- makeCluster()
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
stopCluster(cl)
------------------------------------------------------------------------
and run it as
------------------------------------------------------------------------
#!/bin/bash
#PBS -q SMP_queue
#PBS -l select=1:ncpus=4:mpiprocs=4
#PBS -l place=scatter:excl
module load apps/R
module load libs/R-mpi
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE
mpiexec RMPISNOW -f snowtest_solo.r
-----------------------------------------------------------------------
all the R processes just sit there spinning rather than doing anything
useful and I have to kill the job.
the suggestion in this mail:
https://stat.ethz.ch/pipermail/r-sig-hpc/2009-January/000069.html
results in the same problem of R spinning. I suspect that there is
something different about my OpenMPI setup that means snow is failing to
set up a master process. So you end up with all four processes as slaves
spinning on a network poll.
--
Huw Lynes | Advanced Research Computing
HEC Sysadmin | Cardiff University
| Redwood Building,
Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB
More information about the R-sig-hpc
mailing list