[R-sig-hpc] doSNOW + foreach = embarrassingly frustrating computation

Marius Hofert m_hofert at web.de
Tue Dec 21 21:49:11 CET 2010


Okay, I ran (2) and (3) again, now with the lines

mpi.close.Rslaves()
mpi.quit()

(as suggested) in the end. Both programs stopped with:

> mpi.close.Rslaves()
Error in mpi.close.Rslaves() : It seems no slaves running on comm 1
Execution halted

I therefore ran (2) and (3) again, but only with 
mpi.quit()
Below is the output.

So it seems to work! I have the following remaining questions:

(1) with "n <- mpi.universe.size()", program (3) elegantly uses all available CPUs (if I understand this command correctly). Can this always be used or should one think like that: "I need 3 workers, that's why I should use makeCluster(3, type = "MPI")". Obviously, with "n <- mpi.universe.size()" one is not required to think about the number of workers.

(2) Brian Peterson pointed out that doMPI might be a better choice. I would like to have a similar minimal example as (3) but with doMPI. Unfortunately, I couldn't find an equivalent of "clusterSetupRNG()" [from snow] for doMPI. Do you know how to setup rlecuyer with doMPI? I tried the following, but that does not work (of course):

## snippet doMPI start ====

library(doMPI) 
library(foreach)
library(rlecuyer)

cl <- startMPIcluster()
clusterSetupRNG(cl, seed = rep(1,6)) # => only works with doSNOW (otherwise, you'll get 'Error: could not find function "clusterSetupRNG"')
registerDoMPI(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% { 
   sqrt(i)
}
x 
stopCluster(cl) # properly shut down the cluster
mpi.quit()

## snippet doMPI end ====

Cheers,

Marius

## === output of (2) start === 

Sender: LSF System <lsfadmin at a6169>
Subject: Job 195661: <mpirun -n 1 R --no-save -q -f m02.R> Done

Job <mpirun -n 1 R --no-save -q -f m02.R> was submitted from host <brutus3> by user <hofertj> in cluster <brutus>.
Job was executed on host(s) <4*a6169>, in queue <pub.1h>, as user <hofertj> in cluster <brutus>.
</cluster/home/math/hofertj> was used as the home directory.
</cluster/home/math/hofertj> was used as the working directory.
Started at Tue Dec 21 21:15:16 2010
Results reported at Tue Dec 21 21:15:27 2010

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
mpirun -n 1 R --no-save -q -f m02.R
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :      6.41 sec.
    Max Memory :         4 MB
    Max Swap   :        29 MB

    Max Processes  :         1
    Max Threads    :         1

The output (if any) follows:

> library(doSNOW) 
Loading required package: foreach
Loading required package: iterators
Loading required package: codetools
Loading required package: snow
> library(Rmpi)
> library(rlecuyer)
> 
> cl <- makeCluster(3, type = "MPI") # create cluster object with the given number of slaves 
	3 slaves are spawned successfully. 0 failed.
> clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
[1] "RNGstream"
> registerDoSNOW(cl) # register the cluster object with foreach
> ## start the work
> x <- foreach(i = 1:3) %dopar% { 
+    sqrt(i)
+ }
> x 
[[1]]
[1] 1

[[2]]
[1] 1.414214

[[3]]
[1] 1.732051

> stopCluster(cl) # properly shut down the cluster 
[1] 1
> mpi.quit()

## === output of (2) end === 

## === output of (3) start ===

Sender: LSF System <lsfadmin at a6169>
Subject: Job 195663: <mpirun -n 1 R --no-save -q -f m03.R> Done

Job <mpirun -n 1 R --no-save -q -f m03.R> was submitted from host <brutus3> by user <hofertj> in cluster <brutus>.
Job was executed on host(s) <4*a6169>, in queue <pub.1h>, as user <hofertj> in cluster <brutus>.
</cluster/home/math/hofertj> was used as the home directory.
</cluster/home/math/hofertj> was used as the working directory.
Started at Tue Dec 21 21:15:16 2010
Results reported at Tue Dec 21 21:15:31 2010

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
mpirun -n 1 R --no-save -q -f m03.R
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :     24.67 sec.
    Max Memory :       297 MB
    Max Swap   :      1934 MB

    Max Processes  :         8
    Max Threads    :        19

The output (if any) follows:

> library(doSNOW) 
Loading required package: foreach
Loading required package: iterators
Loading required package: codetools
Loading required package: snow
> library(Rmpi)
> library(rlecuyer)
> 
> n <- mpi.universe.size()
> cl <- makeCluster(n, type ="MPI") # create cluster object 
	4 slaves are spawned successfully. 0 failed.
> clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
[1] "RNGstream"
> registerDoSNOW(cl) # register the cluster object with foreach
> ## start the work
> x <- foreach(i = 1:3) %dopar% { 
+    sqrt(i)
+ }
> x 
[[1]]
[1] 1

[[2]]
[1] 1.414214

[[3]]
[1] 1.732051

> stopCluster(cl) # properly shut down the cluster
[1] 1
> mpi.quit()

## === output of (3) end ===



More information about the R-sig-hpc mailing list