[R-sig-hpc] Help with doMPI on multiple cores on a cluster

Stephen Weston stephen.b.weston at gmail.com
Mon Oct 21 15:55:55 CEST 2013

Hi Srihari,

I suspect it's an MPI issue.  Are you able to run any other simple MPI
programs successfully, and specifically, any using R with Rmpi?  From
the error message, it appears that you're using Intel MPI, which I've
never used.  I believe Rmpi is primarily tested with Open MPI, which
is what I've always used with doMPI.  It would be interesting to see
if you can run successfully using Open MPI, if that is possible for

You'll probably need to look for help on an Intel MPI forum, although
you may need to reduce the problem to something that doesn't use R.

Here is a similar issue that I found on an Intel MPI forum:


You could also try running without spawning, since that may be a
problem for Intel MPI.  To do that, change the R script to use:

    cl <- startMPIcluster()

Also change the mpirun command in the PBS script to use '-n 32' or
don't specify the -n option at all.  In that case, mpirun will start
all of the workers as well as the master which may work better.


Steve Weston

On Mon, Oct 21, 2013 at 9:42 AM, Srihari Radhakrishnan
<srihari at iastate.edu> wrote:
> Hi,
> I've been trying to use the doMPI to run the iterations of a for loop in
> parallel (using the foreach package) on a cluster. However, I've been
> running into issues - I think its the way I am running the R script, but I
> could be wrong. Here's the description of the problem.
> We use a PBS scheduler to submit jobs; my script uses 2 nodes (32 cores)
> for now. I run 1 version of the R interpreter which internally calls 31
> workers using R's mpi libraries. I produce below the PBS script, the R code
> (the relevant bits) and the error.
> ***Begin PBS Script***
> #!/bin/bash
> #PBS -lnodes=2:ppn=16:compute,walltime=12:00:00
> # Change to directory from which qsub command was issued
>    cd $PBS_O_WORKDIR
> #Call mpirun with 1 copy of the R interpreter. This will spawn 31 workers,
> inside the R script
> time mpirun -n 1 R --slave -f ParallelAnalysis.R
> ***End PBS script***
> ***Begin R Script***
>  source("http://bioconductor.org/biocLite.R")
>  #MPI stuff initialization
>  library(Rmpi)
>  library(foreach)
>  library(doMPI)
>  cl <- startMPIcluster(count=31) #call 31 clusterworkers/slaves
>  registerDoMPI(cl)
>  library(MEDIPS)
>  library(BSgenome)
> .
> .
> *more R code; variable assignments etc; no mpi stuff here*
> .
> .
> #Following code will run 100 parallel iterations using the doMPI library
> loaded above and output results to the variable x. x is a table and stores
> results from iterations as rows.
> x <-foreach(i=1:100,.combine='rbind') %dopar% {
> *stuff to do inside loop*
> }
> write.table(x, "output.tsv") #write x into file.
> ***End R script***
> The execution halts as soon as the libraries are loaded - I get the
> following error message repeatedly from both nodes (node 203 and node 202)
> *[2:node203] unexpected disconnect completion event from dynamic process
> with rank=0 pg*
> *_id=kvs_17890_0 0x1fce600*
> *Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0*
> I am not sure if this is an issue with the compilers or the script itself.
> The script runs successfully without using mpi (using only 1 node). Any
> help would be highly appreciated.
> Thanks in advance,
> Srihari
> --
> Srihari Radhakrishnan
> Ph.D candidate
> Valenzuela Lab
> Iowa State University
>         [[alternative HTML version deleted]]
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

More information about the R-sig-hpc mailing list