[R-sig-hpc] openmpi/rmpi/snow: current puzzles, possible improvements

Ross Boylan ross at biostat.ucsf.edu
Thu May 14 06:52:06 CEST 2009


After reading through the thread around
https://stat.ethz.ch/pipermail/r-sig-hpc/2009-February/000105.html, as
well as looking at some other things, for ideas about running snow on
top of Rmpi on Debian Lenny, I decided to try a shell script:
----------------------------------------------------------------
R_PROFILE=/usr/lib/R/site-library/snow/RMPISNOWprofile; export R_PROFILE
mpirun -np 6 -hostfile hosts R CMD BATCH snowjob.R snowjob.out
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
with this kind of snowjob.R:
-------------------------------------------------------------------
# This will only execute on the head node
cl <- getMPIcluster()
print(mpi.comm.rank(0))

quickinfo <- function() {
  list(rank=mpi.comm.rank(0), machine=Sys.info()) #system("hostname"))
}
print(clusterCall(cl, quickinfo))
stopCluster(cl)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

and hosts file
-------------------
n7 slots=3
n5 slots=0  # changing this to 2 didn't help
n4 slots=4
^^^^^^^^^^^^^^^^^^^

I'm on n7.

Two problems.

First, the job shown never terminates. snowjob.out shows the standard R
banner, a standard harmless complaint, and then nothing (technically it
shows 
[n7:14829] OOB: Connection to HNP lost
but I assume that is after I ^c my shell script).

I suspect the problem is that it's having trouble reaching the other
nodes.

Second, if I have n7 slots=7 the job completes.  It shows everything on
n7.  However, if I use machine=system("hostname") I get back null
strings.  system("hostname") works fine interactively.

Perhaps this is some kind of quoting effect when system("hostname") is
exported via clusterCall?  Or system() doesn't work under rmpi?

I'm also not sure why I am not running into a 3rd problem: it looks as
if each process should be writing to the same file snowjob.out (via NFS
mounts).  That doesn't seem to be happening.  Perhaps because the slave
R's never make it out of the RMPISNOWProfile code?

If anyone has any thoughts or suggestions, I'd love to hear them.

Ross

P.S. The original problem is that, apparently, makeCluster(n,
type="MPI") will not spawn jobs on other nodes--maybe even not more than
one job spawned at all.  So I'm attempting to bring up snow within an
mpi session.

I did notice the docs on MPI_COMM_SPAWN
http://www.mpi-forum.org/docs/mpi21-report-bw/node202.htm#Node202
indicate there is an info argument which could contain system-dependent
information.  Presumably this could include a hostname; the standard
explicitly leaves this to the implementation.  I couldn't find anything
on the openmpi implementation.  I suppose the source would at least
indicate what works now.

So, IF openmpi supports it, and if the interface is exposed through Rmpi
(which does have mpi.info functions, which might be able to make the
right arguments), there would be a possibility of handling this strictly
within R.



More information about the R-sig-hpc mailing list