From d@v|dp@tr|ck|undqu|@t @end|ng |rom gm@||@com Fri Dec 8 19:49:31 2023 From: d@v|dp@tr|ck|undqu|@t @end|ng |rom gm@||@com (David Lundquist) Date: Fri, 8 Dec 2023 12:49:31 -0600 Subject: [R-sig-hpc] How to attain randomness and reproducibility Message-ID: Currently, I am using the foreach package in R to do the following (in R-flavored pseudocode): *set.seed(1)* *foreach square in big grid of parameter combinations **%dopar%* * foreach i in 1:replication_count **%do%* * do some computation, output a row, and rbind it* *save the df generated by rbinding* Possible virtues of the preceding: seed is made clear Possible vices of the preceding: failure of randomness One might therefore try moving set.seed to the inner loop: *foreach square in big grid of parameter combinations %dopar%* * foreach i in 1:replication_count **%do%* *set.seed(i)* * do some computation, output a row, and rbind it* *save the df generated by rbinding* Possible virtues of the preceding: seed sequence is explicit Possible vices of the preceding: we've programmed the seeds to be equal I'd appreciate any advice, especially if there's an easy reference/vignette that can be pointed to, a la https://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html [[alternative HTML version deleted]] From @ek@trom @end|ng |rom um|ch@edu Fri Dec 8 20:20:00 2023 From: @ek@trom @end|ng |rom um|ch@edu (Andrew Ekstrom) Date: Fri, 8 Dec 2023 19:20:00 +0000 Subject: [R-sig-hpc] How to attain randomness and reproducibility In-Reply-To: References: Message-ID: Each random seed will give you the same values in the same order, if you collect a string of values. But, when you change random seeds, you are guaranteed to get different values.... if you collect a string of values. By setting the seed to a different number each time, you only get the first random value. Depending upon the application, that first seed number might come up many times. And, those numbers will come up the same every time. Normally, you set the seed and let it ride. Neither method gives you "truely" random numbers. You can create more random data by combining some randomly generated numbers together. Say, i=(Randbetween(1,1000000)+randbetween(1, 1000000))?randbetween(1, 100000) What type of randomness are you looking for? Drew Ekstrom Analytics Expert and Math Lecturer ________________________________ From: R-sig-hpc on behalf of David Lundquist Sent: Friday, December 8, 2023 1:49:31 PM To: r-sig-hpc at r-project.org Subject: [R-sig-hpc] How to attain randomness and reproducibility Currently, I am using the foreach package in R to do the following (in R-flavored pseudocode): *set.seed(1)* *foreach square in big grid of parameter combinations **%dopar%* * foreach i in 1:replication_count **%do%* * do some computation, output a row, and rbind it* *save the df generated by rbinding* Possible virtues of the preceding: seed is made clear Possible vices of the preceding: failure of randomness One might therefore try moving set.seed to the inner loop: *foreach square in big grid of parameter combinations %dopar%* * foreach i in 1:replication_count **%do%* *set.seed(i)* * do some computation, output a row, and rbind it* *save the df generated by rbinding* Possible virtues of the preceding: seed sequence is explicit Possible vices of the preceding: we've programmed the seeds to be equal I'd appreciate any advice, especially if there's an easy reference/vignette that can be pointed to, a la https://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html [[alternative HTML version deleted]] _______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc [[alternative HTML version deleted]] From henr|k@bengt@@on @end|ng |rom gm@||@com Fri Dec 8 20:22:12 2023 From: henr|k@bengt@@on @end|ng |rom gm@||@com (Henrik Bengtsson) Date: Fri, 8 Dec 2023 11:22:12 -0800 Subject: [R-sig-hpc] How to attain randomness and reproducibility In-Reply-To: References: Message-ID: Hello, these are all good questions on a very important topic, especially if we do statistics that rely on randomness. First, %do% and %dopar% are completely different creatures. That is important to take into account here. It is not possible to generate numerically reproducible random number between the two - at least not with built-in methods. Any attempts to do so would require major efforts. For %do%, which is running all sequentially in the main R session, you can use traditional techniques in R, i.e. set.seed(). This is not different from how we do it with sequential lapply(), purrr::map(), etc. If you parallelize with %dopar%, you want to replace that with %dorng% from the doRNG package. Then %dorng% will take care of producing statistically sound (L'Ecuyer-CMRG) random numbers for your parallel threads. The reason why this works is that %dorng% is a wrapper around %dopar%. It works with any registerDoNNN() adapter than the user chooses. y <- foreach(i = 1:3) %dorng% { rnorm(1) } You do *not* want to use the alternative that doRNG provides, registerDoRNG() with %dopar%. The reason why you do not want to use that, is that as a developer you loose control. registerDoRNG() can be overridden by the end-user, where with %dorng% you are guaranteed that your foreach() call will use parallel RNG from doRNG. We should never call set.seed() inside code, functions, or map-reduce calls, loops, inside package code, etc. It is not designed for that, it is not statistically sound, and may introduce biases. Instead, we should also aim for setting it at the very beginning of our R scripts, and leave it to the end-user to be able to adjust it, or remove it. If we set in an R package, there is a great risk it has a negative, unforeseeable impact elsewhere. If you use the Futureverse for parallelization (I'm the maintainer), you have a third option for foreach() by using the doFuture package (https://dofuture.futureverse.org/). Replace %dopar% with %dofuture%. Then pass argument .options.future = list(seed = TRUE) to foreach(), to declare you want to use parallel RNGs (L'Ecuyer-CMRG), e.g. y <- foreach(i = 1:3, .options.future = list(seed = TRUE)) %dofuture% { rnorm(1) } An alternative for the exact same thing, is: y <- foreach(i = 1:3) %dofuture% { rnorm(1) } %seed% TRUE This is fully compatible with how it works when using future.apply::future_lapply(), furrr::future_map(), etc. /Henrik PS. I'll hijack this thread to push out a reminder: foreach() is *not* a for loop. It's no difference from using lapply(), purrr::map(), etc. - it's just a different syntax. To better understand why this is, we should all challenge ourselves to not use terms like "for-loop" and "loop" when talking about foreach(). On Fri, Dec 8, 2023 at 10:49?AM David Lundquist wrote: > > Currently, I am using the foreach package in R to do the following (in > R-flavored pseudocode): > > *set.seed(1)* > *foreach square in big grid of parameter combinations **%dopar%* > * foreach i in 1:replication_count **%do%* > * do some computation, output a row, and rbind it* > *save the df generated by rbinding* > > Possible virtues of the preceding: seed is made clear > Possible vices of the preceding: failure of randomness > > One might therefore try moving set.seed to the inner loop: > > *foreach square in big grid of parameter combinations %dopar%* > * foreach i in 1:replication_count **%do%* > *set.seed(i)* > * do some computation, output a row, and rbind it* > *save the df generated by rbinding* > > Possible virtues of the preceding: seed sequence is explicit > Possible vices of the preceding: we've programmed the seeds to be equal > > I'd appreciate any advice, especially if there's an easy reference/vignette > that can be pointed to, a la > > https://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf > https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-hpc mailing list > R-sig-hpc at r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc From vo|ody@ @end|ng |rom m|nd@pr|ng@com Fri Dec 8 21:35:49 2023 From: vo|ody@ @end|ng |rom m|nd@pr|ng@com (Vladimir Dergachev) Date: Fri, 8 Dec 2023 15:35:49 -0500 (EST) Subject: [R-sig-hpc] How to attain randomness and reproducibility In-Reply-To: References: Message-ID: One possibility to consider is using a hash function to get random numbers: r = hash(GLOBAL_SEED, parameter1, parameter2, parameter3) This way the random numbers r are the same for each combination of parameters. This stays so even when you change how many parameters you go over. And if you want to get a different realization you just change GLOBAL_SEED. There is a implementation of hash function in RMVL package as "mvl_hash_vectors" that returns uniformly distributed values between 1.0 and 2.0. best Vladimir Dergachev On Fri, 8 Dec 2023, David Lundquist wrote: > Currently, I am using the foreach package in R to do the following (in > R-flavored pseudocode): > > *set.seed(1)* > *foreach square in big grid of parameter combinations **%dopar%* > * foreach i in 1:replication_count **%do%* > * do some computation, output a row, and rbind it* > *save the df generated by rbinding* > > Possible virtues of the preceding: seed is made clear > Possible vices of the preceding: failure of randomness > > One might therefore try moving set.seed to the inner loop: > > *foreach square in big grid of parameter combinations %dopar%* > * foreach i in 1:replication_count **%do%* > *set.seed(i)* > * do some computation, output a row, and rbind it* > *save the df generated by rbinding* > > Possible virtues of the preceding: seed sequence is explicit > Possible vices of the preceding: we've programmed the seeds to be equal > > I'd appreciate any advice, especially if there's an easy reference/vignette > that can be pointed to, a la > > https://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf > https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-hpc mailing list > R-sig-hpc at r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc >