[R-sig-hpc] snow/snowfall: advantages of MPI over the socket method?

Vinh Nguyen vqnguyen at uci.edu
Thu Jan 7 01:40:32 CET 2010


Also, suppose I execute R instances of RMPISNOW via mpirun on a node
(something along the line of "mpirun -np3 --host myhost RMPISNOW).
using "ps augx", i see that there are 2 R instances started (not
including the master process).  In RMPISNOW, executing sfInit(cpus=2,
parallel=TRUE, type="MPI") opens two more.  The following is a snippet
from ps augx:

vqnguyen 12481 97.2  0.4 315320 71108 ?        R    16:24   0:49
/apps/R/2.10.0/lib64/R/bin/exec/R --slave --no-restore
--file=/apps/R/2.10.0/lib64/R/library/snow/RMPInode.R --args
SNOWLIB=/apps/R/2.10.0/lib64/R/library OUT=/dev/null
vqnguyen 12480 97.2  0.4 313088 68872 ?        R    16:24   0:49
/apps/R/2.10.0/lib64/R/bin/exec/R --slave --no-restore
--file=/apps/R/2.10.0/lib64/R/library/snow/RMPInode.R --args
SNOWLIB=/apps/R/2.10.0/lib64/R/library OUT=/dev/null
vqnguyen 12467 74.6  0.1 258704 29844 ?        S    16:22   2:33
/apps/R/2.10.0/lib64/R/bin/exec/R --no-save
vqnguyen 12470 74.6  0.1 258700 29844 ?        S    16:22   2:33
/apps/R/2.10.0/lib64/R/bin/exec/R --no-save

The first two are spawned after sfInit() is ran.

Is this how things should look like?  I was expecting only two slaves.
 Can anyone confirm?  Thanks.

Vinh

On Wed, Jan 6, 2010 at 3:22 PM, Vinh Nguyen <vqnguyen at uci.edu> wrote:
> Dear R-HPC list,
>
> I've been using snowfall via the socket method quite a bit the last
> year for a lot of my simulation studies.  The cluster I work on has
> SGE and I've been extracting information from the "qhost" command to
> find idle nodes to spawn R instances via sfInit().  Yes, I know this
> may turn some heads for system admins but I've been diligent in not
> hogging the shared resources -- this workflow has served me well.
>
> I've also been working with a system admin at my school to try to get
> OpenMPI running with snow/snowfall.  At first, we were going to go
> with the LAM/MPI implementation for use with the "sfCluster" unix
> program but decided to go with OpenMPI since it is being actively
> developed (and is possible to work with SGE).  I think things are
> working.  I am aware that you can run batch programs using this
> method, but I prefer to run the interactive session (via mpirun with
> RMPISNOW).
>
> Some things I noticed that are different compared to the socket method:
> 1.  I declare the number of cpus and nodes before my R session is
> launched via the mpirun command.  This launches that many R instances
> on the different nodes (master + workers/slaves).
> 2.  If there is an error in the R code (eg, object not found), the R
> session terminates.
> 3.  I use sfInit() to declare the number of cpus to use from the
> cluster created in 1.
>
> Are there any advantages of the MPI method over the socket method?
> The only advantages I see are to be able to use sfCluster with LAM/MPI
> to select idle nodes from step 1 or to use mpirun/RMPISNOW with SGE to
> manage the shared resources.  Aside from these, my experience is that
> the socket method is the easier and even better (comment 2) method.
> Please let me know otherwise.
>
> Thanks.
> Vinh
>



More information about the R-sig-hpc mailing list