[R-sig-hpc] doSNOW + foreach = embarrassingly frustrating computation

Marius Hofert m_hofert at web.de
Tue Dec 21 18:38:25 CET 2010


Dear HPC-expeRts,

After several days of trial, my minimal example is still not running. In what follows, I summarize what I tried so far. Any help/comment is appreciated.

The problem is simple: I would like to use the R packages "foreach" with "doSNOW" (as parallel backend) to do an embarrassingly parallel computation in R on the cluster "Brutus" of ETH Zurich. My minimal example is to calculate the square root of 1, 2, and 3 in parallel. 
Brutus has the "LSF" batch system and I send the job to the cluster with the command:
bsub -n 4 -R "select[model==Opteron8380]" mpirun R --no-save -q -f minimal.R
where 4 is the number of slaves plus the master, the "select..." forces the jobs to run on a certain processor type, and "minimal.R" is the file name of the minimal example(s) I tried.  

Below are all the different programs I tried (with corresponding output).

Any ideas?

After reading the paragraph starting with "In MPI configurations where process spawning..." in the Details section of "?makeCluster", I had great hope that "getMPIcluster()" would do the job, but it doesn't (please see also https://stat.ethz.ch/pipermail/r-sig-hpc/2010-December/000863.html, where I posted what the maintainers of Brutus told me)

Cheers,

Marius


(1) First trial (check if MPI runs):

Here, I used the Rmpi minimal example as given on http://math.acadiau.ca/ACMMaC/Rmpi/sample.html I described the output here:
https://stat.ethz.ch/pipermail/r-sig-hpc/2010-December/000861.html

I don't know if the "Error in mpi.spawn.Rslaves() :" has a meaning.

(2) Second trial 
Here is the code I tried to run:

## ==== snippet (2) start ====

library(doSNOW) 
library(Rmpi)
library(rlecuyer)

cl <- makeCluster(3, type = "MPI") # create cluster object with the given number of slaves 
clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
registerDoSNOW(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% { 
   sqrt(i)
}
x 
stopCluster(cl) # properly shut down the cluster

## ==== snippet (2) end ====

Here is the corresponding output:

## ==== output (2) start ====

Sender: LSF System <lsfadmin at a6213>
Subject: Job 190250: <mpirun R --no-save -q -f m02.R> Done

Job <mpirun R --no-save -q -f m02.R> was submitted from host <brutus2> by user <hofertj> in cluster <brutus>.
Job was executed on host(s) <4*a6213>, in queue <pub.1h>, as user <hofertj> in cluster <brutus>.
</cluster/home/math/hofertj> was used as the home directory.
</cluster/home/math/hofertj> was used as the working directory.
Started at Tue Dec 21 18:02:15 2010
Results reported at Tue Dec 21 18:02:25 2010

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
mpirun R --no-save -q -f m02.R
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :      7.84 sec.
    Max Memory :         3 MB
    Max Swap   :        29 MB

    Max Processes  :         1
    Max Threads    :         1

The output (if any) follows:

master (rank 0, comm 1)Loading required package: foreach
Loading required package: iterators
 of size 4 is running on: a6213 
slave1 (rank 1, comm 1) of size 4 is running on: a6213 
slave2 (rank 2, comm 1) of size 4 is running on: a6213 
slave3 (rank 3, comm 1) of size 4 is running on: a6213 
> library(doSNOW) 
Loading required package: codetools
Loading required package: snow
> library(Rmpi)
> library(rlecuyer)
> 
> cl <- makeCluster(3, type = "MPI") # create cluster object with the given number of slaves 
Error in makeMPIcluster(spec, ...) : a cluster already exists 1
Calls: makeCluster -> makeMPIcluster
> clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
Error in clusterSetupRNGstream(cl, ...) : object 'cl' not found
Calls: clusterSetupRNG -> clusterSetupRNGstream
> registerDoSNOW(cl) # register the cluster object with foreach
Error in assign("data", data, pos = .foreachGlobals, inherits = FALSE) : 
  object 'cl' not found
Calls: registerDoSNOW -> setDoPar -> assign
> ## start the work
> x <- foreach(i = 1:3) %dopar% { 
+    sqrt(i)
+ }
Error in checkCluster(cl) : not a valid cluster
Calls: %dopar% -> <Anonymous> -> clusterCall -> checkCluster
> x 
Error: object 'x' not found
> stopCluster(cl) # properly shut down the cluster 
Error in stopCluster(cl) : object 'cl' not found
> 
[1] "Please use mpi.close.Rslaves() to close slaves"
[1] "Please use mpi.quit() to quit R"

## ==== output (2) end ====

(3) Third trial 

## ==== snippet (3) start ====

library(doSNOW) 
library(Rmpi)
library(rlecuyer)

cl <- makeCluster() # create cluster object
clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
registerDoSNOW(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% { 
   sqrt(i)
}
x 
stopCluster(cl) # properly shut down the cluster

## ==== snippet (3) end ====

## ==== output (3) start ====

no output. The script runs and "hangs" somewhere; the job had to be killed.

## ==== output (3) end ====

(4) Fourth trial 

## ==== snippet (4) start ====

library(doSNOW) 
library(Rmpi)
library(rlecuyer)

cl <- makeMPIcluster() # create cluster object 
clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
registerDoSNOW(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% { 
   sqrt(i)
}
x 
stopCluster(cl) # properly shut down the cluster

## ==== snippet (4) end ====

## ==== output (4) start ====

no output. The script runs and "hangs" somewhere; the job had to be killed.

## ==== output (4) end ====

(5) Fifth trial 

## ==== snippet (5) start ====

library(doSNOW) 
library(Rmpi)
library(rlecuyer)

cl <- getMPIcluster() # get the MPI cluster
clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
registerDoSNOW(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% { 
   sqrt(i)
}
x 
stopCluster(cl) # properly shut down the cluster

## ==== snippet (5) end ====

## ==== output (5) start ====

Sender: LSF System <lsfadmin at a6227>
Subject: Job 190252: <mpirun R --no-save -q -f m04.R> Done

Job <mpirun R --no-save -q -f m04.R> was submitted from host <brutus2> by user <hofertj> in cluster <brutus>.
Job was executed on host(s) <4*a6227>, in queue <pub.1h>, as user <hofertj> in cluster <brutus>.
</cluster/home/math/hofertj> was used as the home directory.
</cluster/home/math/hofertj> was used as the working directory.
Started at Tue Dec 21 18:02:15 2010
Results reported at Tue Dec 21 18:02:26 2010

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
mpirun R --no-save -q -f m04.R
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :     12.01 sec.
    Max Memory :         3 MB
    Max Swap   :        29 MB

    Max Processes  :         1
    Max Threads    :         1

The output (if any) follows:

master (rank 0, comm 1) of size 4 is running on: a6227 
slave1 (rank 1, comm 1) of size 4 is running on: a6227 
slave2 (rank 2, comm 1) of size 4 is running on: a6227 
slave3 (rank 3, comm 1) of size 4 is running on: a6227 
> library(doSNOW) 
Loading required package: foreach
Loading required package: iterators
Loading required package: codetools
Loading required package: snow
> library(Rmpi)
> library(rlecuyer)
> 
> cl <- getMPIcluster() # get the MPI cluster
> clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
Error in checkCluster(cl) : not a valid cluster
Calls: clusterSetupRNG ... clusterSetupRNGstream -> clusterApply -> staticClusterApply -> checkCluster
> registerDoSNOW(cl) # register the cluster object with foreach
> ## start the work
> x <- foreach(i = 1:3) %dopar% { 
+    sqrt(i)
+ }
Error in checkCluster(cl) : not a valid cluster
Calls: %dopar% -> <Anonymous> -> clusterCall -> checkCluster
> x 
Error: object 'x' not found
> stopCluster(cl) # properly shut down the cluster
> 
> 
[1] "Please use mpi.close.Rslaves() to close slaves"
[1] "Please use mpi.quit() to quit R"

## ==== output (5) end ====

(6) Sixth trial

## ==== snippet (6) start ====

library(doSNOW) 
library(Rmpi)
library(rlecuyer)

cl <- makeMPIcluster(3) # create cluster object with the given number of slaves 
clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
registerDoSNOW(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% { 
   sqrt(i)
}
x 
stopCluster(cl) # properly shut down the cluster

## ==== snippet (6) end ====

## ==== output (6) start ====

Sender: LSF System <lsfadmin at a6262>
Subject: Job 190255: <mpirun R --no-save -q -f m06.R> Done

Job <mpirun R --no-save -q -f m06.R> was submitted from host <brutus2> by user <hofertj> in cluster <brutus>.
Job was executed on host(s) <4*a6262>, in queue <pub.1h>, as user <hofertj> in cluster <brutus>.
</cluster/home/math/hofertj> was used as the home directory.
</cluster/home/math/hofertj> was used as the working directory.
Started at Tue Dec 21 18:02:15 2010
Results reported at Tue Dec 21 18:02:24 2010

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
mpirun R --no-save -q -f m06.R
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :      4.90 sec.
    Max Memory :         4 MB
    Max Swap   :        29 MB

    Max Processes  :         1
    Max Threads    :         1

The output (if any) follows:

master (rank 0, comm 1) of size 4 is running on: a6262 
slave1 (rank 1, comm 1) of size 4 is running on: a6262 
slave2 (rank 2, comm 1) of size 4 is running on: a6262 
slave3 (rank 3, comm 1) of size 4 is running on: a6262 
> library(doSNOW) 
Loading required package: foreach
Loading required package: iterators
Loading required package: codetools
Loading required package: snow
> library(Rmpi)
> library(rlecuyer)
> 
> cl <- makeMPIcluster(3) # create cluster object with the given number of slaves 
Error in makeMPIcluster(3) : a cluster already exists 1
> clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
Error in clusterSetupRNGstream(cl, ...) : object 'cl' not found
Calls: clusterSetupRNG -> clusterSetupRNGstream
> registerDoSNOW(cl) # register the cluster object with foreach
Error in assign("data", data, pos = .foreachGlobals, inherits = FALSE) : 
  object 'cl' not found
Calls: registerDoSNOW -> setDoPar -> assign
> ## start the work
> x <- foreach(i = 1:3) %dopar% { 
+    sqrt(i)
Error in checkCluster(cl) : not a valid cluster
Calls: %dopar% -> <Anonymous> -> clusterCall -> checkCluster
Error: object 'x' not found
Error in stopCluster(cl) : object 'cl' not found
+ }
> x 
> stopCluster(cl) # properly shut down the cluster
> 
[1] "Please use mpi.close.Rslaves() to close slaves"
[1] "Please use mpi.quit() to quit R"

## ==== output (6) end ====



More information about the R-sig-hpc mailing list