[R] R and Openmpi

Dirk Eddelbuettel edd at debian.org
Sat May 31 20:23:37 CEST 2008


Paul,

On 30 May 2008 at 15:47, Paul Hewson wrote:
| Hello,
| 
| We have R working with Rmpi/openmpi, but I'm a little worried.   Specifically, (a) the -np flag doesn't seem to override the hostfile (it works fine with fortran hello world) and (b) I appear to have twice as many processes running as I think I should.
| 
| Rmpi version 0.5.5
| Openmpi version 1.1

That's old. Open MPI 1.2.* fixed and changed a lot of things. I am happy with
1.2.6, the default on Debian.

| Viglen HPC with (effectively) 9 blades and 8 nodes on each blade.
| myhosts file contains details of the 9 blades, but specifies that there are 4 slots on each blade (to make sure I leave room for other users).
| 
| When running mpirun -bynode -np 2 -hostfile myhosts R --slave --vanilla  task_pull.R
|
| 1.   I get as many R slaves as there slots defined in my myhosts file (there are 36 slots defined, and I get 36 slaves, regardless of the setting of -np, the master goes on the first machine in the myhosts file.
| 2.   The .Rout file confirms that I have 1 comm with 1 master and 36 slaves
| 3.   When I top each blade it indicates that there are in fact 8 processes running on each blade and
| 4.   When I pstree each blade it indicates that there are two orted processes, each with 4 subprocesses.

You never showed us task_pull.R ... And as I readily acknowledge that this
can be tricky, why don't you experiment with simple setting?.  Consider this
token littler [1] invocation (or use Rscript if you prefer / have only that):

  edd at ron:~> r -e'library(Rmpi); cat("Hello rank", mpi.comm.rank(0), "size", mpi.comm.size(0), "on", mpi.get.processor.name(), "\n")'
  Hello rank 0 size 1 on ron
  edd at ron:~>

So without an outer mpirun (or orterun as the Open MPI group now calls it) we
get one instance. Makes sense.  

Now with two hosts defined on the fly, and two instances each:

  edd at ron:~> orterun -n 4 -H ron,joe r -e'library(Rmpi); cat("Hello rank", mpi.comm.rank(0), "size", mpi.comm.size(0), "on", mpi.get.processor.name(), "\n")'
  Hello rank 0 size 4 on ron
  Hello rank 2 size 4 on ron
  Hello rank 3 size 4 on joe
  Hello rank 1 size 4 on joe
  edd at ron:~>

Adding '-bynode' and using '-np 4' instead of '-n 4' does not change anything.
 
| >From the point of view of getting a job done this ***seems*** OK (it's running very quickly), but it doesn't seem quite right - given I'm sharing the machine with other users and so on.   Is there something I've missed in the useage of mpirun with R/Rmpi.

I cannot quite determine from what you said here what your objective is.
What exactly are you trying to do that you are not getting done?  Using fewer
instances?  Maybe that is in fact an Open MPI 1.2.* versus 1.1.* issue.

One thing to note is that if you wrap all this in the excellent snow packache
by Tierney et al, then Open MPI's '-n' can always be one as determine from
_within_ how many nodes you want:

  edd at ron:~> orterun -bynode -np 1 -H ron,joe r -e'library(snow); cl <- makeCluster(4, "MPI"); res <- clusterCall(cl, function() Sys.info()["nodename"]); print(do.call(rbind, res))'
  Loading required package: utils
  Loading required package: Rmpi
          4 slaves are spawned successfully. 0 failed.
       nodename
  [1,] "joe"
  [2,] "ron"
  [3,] "joe"
  [4,] "ron"
  edd at ron:~>

Note the outer '-n 1' and the inner makeCluster(4, "MPI") to give you 4
slaves.  If you use a larger '-n $N' you will get $N instances each starting
as many nodes as makeCluster asks for.

Hope this helps, Dirk

[1] Littler can be had via Debian / Ubuntu or from
http://dirk.eddelbuettel.com/code/littler.html

-- 
Three out of two people have difficulties with fractions.



More information about the R-help mailing list