[R-sig-hpc] doSNOW + foreach = embarrassingly frustrating computation
Marius Hofert
m_hofert at web.de
Tue Dec 21 18:38:25 CET 2010
Dear HPC-expeRts,
After several days of trial, my minimal example is still not running. In what follows, I summarize what I tried so far. Any help/comment is appreciated.
The problem is simple: I would like to use the R packages "foreach" with "doSNOW" (as parallel backend) to do an embarrassingly parallel computation in R on the cluster "Brutus" of ETH Zurich. My minimal example is to calculate the square root of 1, 2, and 3 in parallel.
Brutus has the "LSF" batch system and I send the job to the cluster with the command:
bsub -n 4 -R "select[model==Opteron8380]" mpirun R --no-save -q -f minimal.R
where 4 is the number of slaves plus the master, the "select..." forces the jobs to run on a certain processor type, and "minimal.R" is the file name of the minimal example(s) I tried.
Below are all the different programs I tried (with corresponding output).
Any ideas?
After reading the paragraph starting with "In MPI configurations where process spawning..." in the Details section of "?makeCluster", I had great hope that "getMPIcluster()" would do the job, but it doesn't (please see also https://stat.ethz.ch/pipermail/r-sig-hpc/2010-December/000863.html, where I posted what the maintainers of Brutus told me)
Cheers,
Marius
(1) First trial (check if MPI runs):
Here, I used the Rmpi minimal example as given on http://math.acadiau.ca/ACMMaC/Rmpi/sample.html I described the output here:
https://stat.ethz.ch/pipermail/r-sig-hpc/2010-December/000861.html
I don't know if the "Error in mpi.spawn.Rslaves() :" has a meaning.
(2) Second trial
Here is the code I tried to run:
## ==== snippet (2) start ====
library(doSNOW)
library(Rmpi)
library(rlecuyer)
cl <- makeCluster(3, type = "MPI") # create cluster object with the given number of slaves
clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
registerDoSNOW(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% {
sqrt(i)
}
x
stopCluster(cl) # properly shut down the cluster
## ==== snippet (2) end ====
Here is the corresponding output:
## ==== output (2) start ====
Sender: LSF System <lsfadmin at a6213>
Subject: Job 190250: <mpirun R --no-save -q -f m02.R> Done
Job <mpirun R --no-save -q -f m02.R> was submitted from host <brutus2> by user <hofertj> in cluster <brutus>.
Job was executed on host(s) <4*a6213>, in queue <pub.1h>, as user <hofertj> in cluster <brutus>.
</cluster/home/math/hofertj> was used as the home directory.
</cluster/home/math/hofertj> was used as the working directory.
Started at Tue Dec 21 18:02:15 2010
Results reported at Tue Dec 21 18:02:25 2010
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
mpirun R --no-save -q -f m02.R
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 7.84 sec.
Max Memory : 3 MB
Max Swap : 29 MB
Max Processes : 1
Max Threads : 1
The output (if any) follows:
master (rank 0, comm 1)Loading required package: foreach
Loading required package: iterators
of size 4 is running on: a6213
slave1 (rank 1, comm 1) of size 4 is running on: a6213
slave2 (rank 2, comm 1) of size 4 is running on: a6213
slave3 (rank 3, comm 1) of size 4 is running on: a6213
> library(doSNOW)
Loading required package: codetools
Loading required package: snow
> library(Rmpi)
> library(rlecuyer)
>
> cl <- makeCluster(3, type = "MPI") # create cluster object with the given number of slaves
Error in makeMPIcluster(spec, ...) : a cluster already exists 1
Calls: makeCluster -> makeMPIcluster
> clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
Error in clusterSetupRNGstream(cl, ...) : object 'cl' not found
Calls: clusterSetupRNG -> clusterSetupRNGstream
> registerDoSNOW(cl) # register the cluster object with foreach
Error in assign("data", data, pos = .foreachGlobals, inherits = FALSE) :
object 'cl' not found
Calls: registerDoSNOW -> setDoPar -> assign
> ## start the work
> x <- foreach(i = 1:3) %dopar% {
+ sqrt(i)
+ }
Error in checkCluster(cl) : not a valid cluster
Calls: %dopar% -> <Anonymous> -> clusterCall -> checkCluster
> x
Error: object 'x' not found
> stopCluster(cl) # properly shut down the cluster
Error in stopCluster(cl) : object 'cl' not found
>
[1] "Please use mpi.close.Rslaves() to close slaves"
[1] "Please use mpi.quit() to quit R"
## ==== output (2) end ====
(3) Third trial
## ==== snippet (3) start ====
library(doSNOW)
library(Rmpi)
library(rlecuyer)
cl <- makeCluster() # create cluster object
clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
registerDoSNOW(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% {
sqrt(i)
}
x
stopCluster(cl) # properly shut down the cluster
## ==== snippet (3) end ====
## ==== output (3) start ====
no output. The script runs and "hangs" somewhere; the job had to be killed.
## ==== output (3) end ====
(4) Fourth trial
## ==== snippet (4) start ====
library(doSNOW)
library(Rmpi)
library(rlecuyer)
cl <- makeMPIcluster() # create cluster object
clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
registerDoSNOW(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% {
sqrt(i)
}
x
stopCluster(cl) # properly shut down the cluster
## ==== snippet (4) end ====
## ==== output (4) start ====
no output. The script runs and "hangs" somewhere; the job had to be killed.
## ==== output (4) end ====
(5) Fifth trial
## ==== snippet (5) start ====
library(doSNOW)
library(Rmpi)
library(rlecuyer)
cl <- getMPIcluster() # get the MPI cluster
clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
registerDoSNOW(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% {
sqrt(i)
}
x
stopCluster(cl) # properly shut down the cluster
## ==== snippet (5) end ====
## ==== output (5) start ====
Sender: LSF System <lsfadmin at a6227>
Subject: Job 190252: <mpirun R --no-save -q -f m04.R> Done
Job <mpirun R --no-save -q -f m04.R> was submitted from host <brutus2> by user <hofertj> in cluster <brutus>.
Job was executed on host(s) <4*a6227>, in queue <pub.1h>, as user <hofertj> in cluster <brutus>.
</cluster/home/math/hofertj> was used as the home directory.
</cluster/home/math/hofertj> was used as the working directory.
Started at Tue Dec 21 18:02:15 2010
Results reported at Tue Dec 21 18:02:26 2010
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
mpirun R --no-save -q -f m04.R
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 12.01 sec.
Max Memory : 3 MB
Max Swap : 29 MB
Max Processes : 1
Max Threads : 1
The output (if any) follows:
master (rank 0, comm 1) of size 4 is running on: a6227
slave1 (rank 1, comm 1) of size 4 is running on: a6227
slave2 (rank 2, comm 1) of size 4 is running on: a6227
slave3 (rank 3, comm 1) of size 4 is running on: a6227
> library(doSNOW)
Loading required package: foreach
Loading required package: iterators
Loading required package: codetools
Loading required package: snow
> library(Rmpi)
> library(rlecuyer)
>
> cl <- getMPIcluster() # get the MPI cluster
> clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
Error in checkCluster(cl) : not a valid cluster
Calls: clusterSetupRNG ... clusterSetupRNGstream -> clusterApply -> staticClusterApply -> checkCluster
> registerDoSNOW(cl) # register the cluster object with foreach
> ## start the work
> x <- foreach(i = 1:3) %dopar% {
+ sqrt(i)
+ }
Error in checkCluster(cl) : not a valid cluster
Calls: %dopar% -> <Anonymous> -> clusterCall -> checkCluster
> x
Error: object 'x' not found
> stopCluster(cl) # properly shut down the cluster
>
>
[1] "Please use mpi.close.Rslaves() to close slaves"
[1] "Please use mpi.quit() to quit R"
## ==== output (5) end ====
(6) Sixth trial
## ==== snippet (6) start ====
library(doSNOW)
library(Rmpi)
library(rlecuyer)
cl <- makeMPIcluster(3) # create cluster object with the given number of slaves
clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
registerDoSNOW(cl) # register the cluster object with foreach
## start the work
x <- foreach(i = 1:3) %dopar% {
sqrt(i)
}
x
stopCluster(cl) # properly shut down the cluster
## ==== snippet (6) end ====
## ==== output (6) start ====
Sender: LSF System <lsfadmin at a6262>
Subject: Job 190255: <mpirun R --no-save -q -f m06.R> Done
Job <mpirun R --no-save -q -f m06.R> was submitted from host <brutus2> by user <hofertj> in cluster <brutus>.
Job was executed on host(s) <4*a6262>, in queue <pub.1h>, as user <hofertj> in cluster <brutus>.
</cluster/home/math/hofertj> was used as the home directory.
</cluster/home/math/hofertj> was used as the working directory.
Started at Tue Dec 21 18:02:15 2010
Results reported at Tue Dec 21 18:02:24 2010
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
mpirun R --no-save -q -f m06.R
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 4.90 sec.
Max Memory : 4 MB
Max Swap : 29 MB
Max Processes : 1
Max Threads : 1
The output (if any) follows:
master (rank 0, comm 1) of size 4 is running on: a6262
slave1 (rank 1, comm 1) of size 4 is running on: a6262
slave2 (rank 2, comm 1) of size 4 is running on: a6262
slave3 (rank 3, comm 1) of size 4 is running on: a6262
> library(doSNOW)
Loading required package: foreach
Loading required package: iterators
Loading required package: codetools
Loading required package: snow
> library(Rmpi)
> library(rlecuyer)
>
> cl <- makeMPIcluster(3) # create cluster object with the given number of slaves
Error in makeMPIcluster(3) : a cluster already exists 1
> clusterSetupRNG(cl, seed = rep(1,6)) # initialize uniform rng streams in a SNOW cluster (L'Ecuyer)
Error in clusterSetupRNGstream(cl, ...) : object 'cl' not found
Calls: clusterSetupRNG -> clusterSetupRNGstream
> registerDoSNOW(cl) # register the cluster object with foreach
Error in assign("data", data, pos = .foreachGlobals, inherits = FALSE) :
object 'cl' not found
Calls: registerDoSNOW -> setDoPar -> assign
> ## start the work
> x <- foreach(i = 1:3) %dopar% {
+ sqrt(i)
Error in checkCluster(cl) : not a valid cluster
Calls: %dopar% -> <Anonymous> -> clusterCall -> checkCluster
Error: object 'x' not found
Error in stopCluster(cl) : object 'cl' not found
+ }
> x
> stopCluster(cl) # properly shut down the cluster
>
[1] "Please use mpi.close.Rslaves() to close slaves"
[1] "Please use mpi.quit() to quit R"
## ==== output (6) end ====
More information about the R-sig-hpc
mailing list