[R-sig-hpc] I can do mpi.apply but not foreach with doMPI

Seija Sirkiä seija.sirkia at csc.fi
Wed Aug 26 10:12:03 CEST 2015


Hi all,

I'm trying to learn to do parallel computing with R and foreach on this cluster of ours but clearly I'm doing something wrong and I can't figure out what.

Briefly, I'm sitting on a Linux cluster, about which the user guide says that the login nodes are based on the RHEL6, while the computing nodes use CentOS 6. Jobs are submitted using SLURM.

So there I go, requesting a short interactive test session using:
srun -p test -n4 -t 0:15:00 --pty Rmpi --no-save

Here Rmpi is the modified R_home_dir/bin/R mentioned in the Rprofile file that comes with Rmpi ("This R profile can be used when a cluster does not allow spawning --- Another way is to modify R_home_dir/bin/R by adding...").

When my session starts, I get these messages:
master (rank 0, comm 1) of size 4 is running on: c1 
slave1 (rank 1, comm 1) of size 4 is running on: c1 
slave2 (rank 2, comm 1) of size 4 is running on: c1 
slave3 (rank 3, comm 1) of size 4 is running on: c1 
before the prompt. Sounds good, and if I go check top on the c1 node, there I see 3 R's churning away happily at 100% cpu time, and one not doing much. As it should be, as far as I can tell?

If I then run this little test:

funtorun<-function(k) {
  system.time(sort(runif(1e7)))
}

system.time(a<-mpi.apply(1:3,funtorun))
a

b<-a
system.time(for(i in 1:3) b[[i]]<-system.time(sort(runif(1e7))))
b

it goes through nicely, and the mpi.apply part takes about 2.6 seconds in total, with each of the 3 sorts taking about that same time, while the latter for-loop takes about 7 seconds in total, each of the three sorts taking about 2.3 seconds. Nice, that tells me the workers will do stuff, simultaneously, when requested correctly.

But if I try this instead:

library(doMPI)
cl<-startMPIcluster()
registerDoMPI(cl)
system.time(a<-foreach(i=1:3) %dopar% system.time(sort(runif(1e7))))

it just hangs up at the foreach line, and never gets through, and only gets killed at the end of the reserved 15 minutes or when I scancel the whole job myself. None of the lines give any errors.

So what am I doing wrong? I have a hunch this has something to do with how my workers are started, since I never get to do those mpirun commands that the doMPI manual speaks of. But despite my efforts of reading the manual and the documentation of startMPIcluster I haven't figured out what else to try.

Many thanks in advance for your time!

BR,
Seija Sirkiä



More information about the R-sig-hpc mailing list