[Bioc-devel] how to achieve reproducibility with BiocParallel regardless of number of threads and OS (set.seed is disallowed)

Lulu Chen luluchen @ending from vt@edu
Mon Dec 31 21:23:28 CET 2018


Hi Martin,

Thanks for your help. But setting different number of workers will generate
different results:

> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(1, RNGseed=123)))
 [1]  1.0654274 -1.2421454  1.0523311 -0.7744536  1.3081934 -1.5305223
1.1525356  0.9287607 -0.4355877  1.5055436
> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(2, RNGseed=123)))
 [1] -0.9685927  0.7061091  1.4890213 -0.4094454  0.8909694 -0.8653704
1.4642711  1.2674845 -0.2220491  2.4505322
> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(3, RNGseed=123)))
 [1] -0.96859273 -0.40944544  0.89096942 -0.86537045  1.46427111
1.26748453 -0.48906078  0.43304237 -0.03195349
[10]  0.14670372
> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(4, RNGseed=123)))
 [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237
-0.03195349 -1.03886641  1.57451249  0.74708204
[10]  0.67187201

Best,
Lulu

On Mon, Dec 31, 2018 at 1:12 PM Martin Morgan <mtmorgan.bioc using gmail.com>
wrote:

> The major BiocParallel objects (SnowParam(), MulticoreParam()) and use of
> bplapply() allow fully repeatable randomizations, e.g.,
>
> > library(BiocParallel)
> > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
>  [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237
> -0.03195349
>  [7] -1.03886641  1.57451249  0.74708204  0.67187201
> > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
>  [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237
> -0.03195349
>  [7] -1.03886641  1.57451249  0.74708204  0.67187201
> > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(RNGseed=123)))
> [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237 -0.03195349
>  [7] -1.03886641  1.57451249  0.74708204  0.67187201
>
> The idea then would be to tell the user to register() such a param, or to
> write your function to accept an argument rngSeed along the lines of
>
> f = function(..., rngSeed = NULL) {
>     if (!is.null(rngSeed)) {
>         param = bpparam()  # user's preferred back-end
>         oseed = bpRNGseed(param)
>         on.exit(bpRNGseed(param) <- oseed)
>         bpRNGseed(param) = rngSeed
>     }
>     bplapply(1:4, rnorm)
> }
>
> (actually, this exercise illustrates a problem with bpRNGseed<-() when the
> original seed is NULL; this will be fixed in the next day or so...)
>
> Is that sufficient for your use case?
>
> On 12/31/18, 11:24 AM, "Bioc-devel on behalf of Lulu Chen" <
> bioc-devel-bounces using r-project.org on behalf of luluchen using vt.edu> wrote:
>
>     Dear all,
>
>     I posted the question in the Bioconductor support site (
>     https://support.bioconductor.org/p/116381/) and was suggested to
> direct
>     future correspondence there.
>
>     I plan to generate a vector of seeds (provided by users through
> argument of
>     my R function) and use them by set.seed() in each parallel computation.
>     However, set.seed() will cause warning in BiocCheck().
>
>     Someone suggested to re-write code using c++, which is a good idea.
> But it
>     will take me much more extra time to re-write some functions from other
>     packages, e.g. eBayes() in limma.
>
>     Hope to get more suggestions from you. Thanks a lot!
>
>     Best,
>     Lulu
>
>         [[alternative HTML version deleted]]
>
>     _______________________________________________
>     Bioc-devel using r-project.org mailing list
>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list