[Bioc-devel] how to achieve reproducibility with BiocParallel regardless of number of threads and OS (set.seed is disallowed)
Lulu Chen
luluchen @ending from vt@edu
Mon Dec 31 21:23:28 CET 2018
Hi Martin,
Thanks for your help. But setting different number of workers will generate
different results:
> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(1, RNGseed=123)))
[1] 1.0654274 -1.2421454 1.0523311 -0.7744536 1.3081934 -1.5305223
1.1525356 0.9287607 -0.4355877 1.5055436
> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(2, RNGseed=123)))
[1] -0.9685927 0.7061091 1.4890213 -0.4094454 0.8909694 -0.8653704
1.4642711 1.2674845 -0.2220491 2.4505322
> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(3, RNGseed=123)))
[1] -0.96859273 -0.40944544 0.89096942 -0.86537045 1.46427111
1.26748453 -0.48906078 0.43304237 -0.03195349
[10] 0.14670372
> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(4, RNGseed=123)))
[1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237
-0.03195349 -1.03886641 1.57451249 0.74708204
[10] 0.67187201
Best,
Lulu
On Mon, Dec 31, 2018 at 1:12 PM Martin Morgan <mtmorgan.bioc using gmail.com>
wrote:
> The major BiocParallel objects (SnowParam(), MulticoreParam()) and use of
> bplapply() allow fully repeatable randomizations, e.g.,
>
> > library(BiocParallel)
> > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
> [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237
> -0.03195349
> [7] -1.03886641 1.57451249 0.74708204 0.67187201
> > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
> [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237
> -0.03195349
> [7] -1.03886641 1.57451249 0.74708204 0.67187201
> > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(RNGseed=123)))
> [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237 -0.03195349
> [7] -1.03886641 1.57451249 0.74708204 0.67187201
>
> The idea then would be to tell the user to register() such a param, or to
> write your function to accept an argument rngSeed along the lines of
>
> f = function(..., rngSeed = NULL) {
> if (!is.null(rngSeed)) {
> param = bpparam() # user's preferred back-end
> oseed = bpRNGseed(param)
> on.exit(bpRNGseed(param) <- oseed)
> bpRNGseed(param) = rngSeed
> }
> bplapply(1:4, rnorm)
> }
>
> (actually, this exercise illustrates a problem with bpRNGseed<-() when the
> original seed is NULL; this will be fixed in the next day or so...)
>
> Is that sufficient for your use case?
>
> On 12/31/18, 11:24 AM, "Bioc-devel on behalf of Lulu Chen" <
> bioc-devel-bounces using r-project.org on behalf of luluchen using vt.edu> wrote:
>
> Dear all,
>
> I posted the question in the Bioconductor support site (
> https://support.bioconductor.org/p/116381/) and was suggested to
> direct
> future correspondence there.
>
> I plan to generate a vector of seeds (provided by users through
> argument of
> my R function) and use them by set.seed() in each parallel computation.
> However, set.seed() will cause warning in BiocCheck().
>
> Someone suggested to re-write code using c++, which is a good idea.
> But it
> will take me much more extra time to re-write some functions from other
> packages, e.g. eBayes() in limma.
>
> Hope to get more suggestions from you. Thanks a lot!
>
> Best,
> Lulu
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list