[R-sig-hpc] Rmpi not spawning across nodes

Russell Pierce russell.s.pierce at gmail.com
Thu Jun 26 16:30:19 CEST 2014


I am having difficulty getting Rmpi to spawn across nodes.  My system
administrator is knowledgable, but unfamilar with R.  Other jobs are
able to run across nodes on the cluster without difficulty.  The
system I am working on has multiple nodes running R 3.0.2 on
x86_64-redhat-linux-gnu (64-bit) with Rmpi_0.6-3  with a openmpi
version 1.6.5 complied with a nopsm option.  nopsm was set while
tracking down another error message on the basis of another post
elsewhere (http://www.open-mpi.org/community/lists/users/2011/10/17660.php)
and seemed to help get Rmpi to compile and run on the remote node.
Rmpi was specifically R CMD INSTALLed against this nopsm version of
openmpi.

What I'd like to be able to do, as a proof of concept, is run R
interactively with access to the multiple nodes on the cluster.  Here
is my minimal example...

>From the login node I can run:
qsub -I -V -l nodes=2:ppn=12

I am transferred to one of the computation nodes, and I can tell that
I’ve been assigned two nodes to work on using the “mynodes” command in
bash.  When I ‘cat $PBS_NODEFILE I get a list of each node name
repeated 16 times.  Therefore, I am reasonably sure I was actually
assigned distinct nodes.

I launch R with the bash command:
mpirun -np 1 -hostfile $PBS_NODEFILE R --interactive –-vanilla

I've also tried using the -n option rather than -np as I've seen in
some other sample scripts with similar results.

Within R on one of the computation node I type the following commands:
library(Rmpi)
mpi.spawn.Rslaves()
mpi.remote.exec(paste(Sys.info()[c("nodename")],"checking in
as",mpi.comm.rank(),"of",mpi.comm.size()))

... the results of these commands indicate that all of the slaves
started on the same node.

I saw the "Rmpi spawning across nodes" topic from March of 2012.
"Snow Not Distributing" from 2012 demonstrates a similar problem.  I
tried Ex60-HelloWorldSnow from that source, but all results indicate
that they were generated from the same node.

Is what I am aiming to do possible?  If so, is there something I am
doing incorrectly or that I need to check/report to help diagnose the
problem?



More information about the R-sig-hpc mailing list