[R-sig-hpc] doParallel: RNG not reproducible

Cristian Bologa CBo|og@ @end|ng |rom @@|ud@unm@edu
Thu Feb 6 16:41:32 CET 2020


Hi Frank,

Please have a look at the doRNG package.

https://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf

Regards,
Cristian


Cristian Bologa, Ph.D.
Research Professor,
Div. of Translational Informatics, 
Dept. of Internal Medicine,
Univ. of New Mexico, School of Medicine,
Innovation Discovery&Training Center, MSC09 5025, 
700 Camino de Salud NE, Albuquerque, NM 87131
Phone: +1 (505) 925-7534
Fax:+1 (505) 925-7625
--------------------------
"True (artificial) intelligence is not the ability to give an answer, but to ask the right question"



-----Original Message-----
From: R-sig-hpc [mailto:r-sig-hpc-bounces using r-project.org] On Behalf Of Frank Weber
Sent: Thursday, February 06, 2020 3:00 AM
To: r-sig-hpc using r-project.org
Subject: [R-sig-hpc] doParallel: RNG not reproducible

[[-- External - this message has been sent from outside the University --]]

Hi everyone,

I am uncertain how to correctly set up the package "doParallel" for getting reproducible results in random number generation (RNG). If I run the following code repeatedly in a fresh R session, then at some point, the stopifnot() check produces an error (indicating the results have
changed):

### Start R code
library(doParallel)

n_slaves <- 8L
cl_obj <- makeCluster(n_slaves)
registerDoParallel(cl_obj)
clusterSetRNGStream(cl_obj, iseed = 2373632L)

rng_res <- foreach(
  icount(as.integer(n_slaves + floor(n_slaves / 2))),
  .combine = "cbind"
) %dopar% {
  c(runif(1), rnorm(1))
}
if(!file.exists("rng_res.rds")){
  saveRDS(rng_res, file = "rng_res.rds") } else{
  rng_res_old <- readRDS(file = "rng_res.rds")
  stopifnot(identical(rng_res, rng_res_old)) } ### End R code

When inspecting the results in detail (between two runs with differing results), it seems that the allocation of computational tasks (i.e. loop
iterations) to cluster workers is swapped. For example, in one run, I get:

### Start output
      result.1   result.2  result.3   result.4   result.5  result.6  
result.7  result.8  result.9 result.10  result.11 [1,] 0.8720487  0.4791119 0.7671285  0.2306335  0.2470827 0.7042595
0.2103175 0.6149857 0.2153797 0.5944501  0.1431205 [2,] 1.3970093 -2.1914685 0.2847861 -2.1083101 -1.0850567 0.1582748
-1.2820137 0.2153303 0.9401810 0.5049244 -1.1084520
       result.12
[1,]  0.53079192
[2,] -0.05597698
### End output

and in another run, I get:

### Start output
      result.1   result.2  result.3   result.4   result.5  result.6  
result.7  result.8  result.9 result.10   result.11
[1,] 0.8720487  0.4791119 0.7671285  0.2306335  0.2470827 0.7042595
0.2103175 0.6149857 0.2153797 0.5944501  0.53079192 [2,] 1.3970093 -2.1914685 0.2847861 -2.1083101 -1.0850567 0.1582748
-1.2820137 0.2153303 0.9401810 0.5049244 -0.05597698
      result.12
[1,]  0.1431205
[2,] -1.1084520
### End output

As one can see, columns 11 and 12 are swapped. Thus, it seems to me that the allocation of computational tasks to cluster workers is not fixed. In the package "doMPI", the documentation states that this fixation is handled by argument "defaultopts$seed" in startMPIcluster(). Is there a similar function/argument/option in "doParallel"? According to the documentation of "doParallel", such a function/argument/option does not exist. But then, how do I get reproducible results in "doParallel"?

My sessionInfo():

### Start output
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
base

other attached packages:
[1] doParallel_1.0.15 iterators_1.0.12  foreach_1.4.7

loaded via a namespace (and not attached):
[1] compiler_3.6.2   tools_3.6.2      codetools_0.2-16
### End output

Note: I am using RStudio. Perhaps this might be important.

Thanks in advance and best regards,
Frank Weber

_______________________________________________
R-sig-hpc mailing list
R-sig-hpc using r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc



More information about the R-sig-hpc mailing list