[Bioc-devel] set.seed and BiocParallel

Martin Morgan mtmorg@n@b|oc @end|ng |rom gm@||@com
Wed Mar 13 10:20:45 CET 2019

Aaron points in the right direction with generating random number streams in the serial part of the program, then sending these to the workers in a consistent way. Use ?nextRNGStream to generate the streams for each replicate, and .Random.seed on the thread. Probably this generates a BiocCheck warning, but so long as the top-level generation of streams on the manager is under control of the user (e.g., there is no need for your function to call `set.seed()`; if the user wants reproducibility they can do that themselves in their own code) this will be ok.


On 3/12/19, 10:37 PM, "Bioc-devel on behalf of Aaron Lun" <bioc-devel-bounces using r-project.org on behalf of infinite.monkeys.with.keyboards using gmail.com> wrote:

    I think Kylie is saying that she wants to use the same seed for each 
    feature across different runs, but the seed can be different across 
    features - which would make more sense.
    Multi-worker reproducibility is an issue that we discussed before (the 
    link goes into the middle of the thread):
    The key thing is that, in addition to reproducibility, there is the 
    issue of correctness with guaranteed independent streams.
    Some food for thought: in the vast majority of my parallelized 
    applications, the heavy lifting (including the RNG'ing) is done in C++. 
    If this is also the case for you, consider using the dqrng package to 
    provide the C++ PRNG. I usually generate all my seeds in the serial part 
    of the code, and then distribute seeds to the jobs where each job is set 
    to a different "stream" value so that the sequence of random numbers is 
    always different, regardless of the seed. As the serial seed generation 
    is under the control of set.seed(), this provides correctness and 
    reproducibility no matter how the jobs are distributed across workers.
    On 12/03/2019 17:42, Kasper Daniel Hansen wrote:
    > But why do you want the same seed for the different features? That is not
    > the right way to use stochastic methods.
    > Best,
    > Kasper
    > On Tue, Mar 12, 2019 at 5:20 PM Bemis, Kylie <k.bemis using northeastern.edu>
    > wrote:
    >> Hi all,
    >> I remember similar questions coming up before, but couldn’t track any down
    >> that directly pertain to my situation.
    >> Suppose I want to use bplapply() in a function to fit models to many
    >> features, and I am applying over features. The models are stochastic, and I
    >> want the results to be reproducible, and preferably use the same RNG seed
    >> for each feature. So I could do:
    >> fitModels <- function(object, seed=1, BPPARAM=bpparam()) {
    >> bplapply(object, function(x) {
    >> set.seed(seed)
    >> fitModel(x)
    >> }
    >> But the BioC guidelines say not to use set.seed() inside function code,
    >> and I’ve seen other questions answered saying not to use “seed” as a
    >> function parameter in this way.
    >> Is it preferable to check and modify .Random.seed directly, or is there
    >> some other standard way of doing this?
    >> Thanks,
    >> Kylie
    >> ~~~
    >> Kylie Ariel Bemis
    >> Khoury College of Computer Sciences
    >> Northeastern University
    >> kuwisdelu.github.io<https://kuwisdelu.github.io>
    >>          [[alternative HTML version deleted]]
    >> _______________________________________________
    >> Bioc-devel using r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
    > 	[[alternative HTML version deleted]]
    > _______________________________________________
    > Bioc-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/bioc-devel
    Bioc-devel using r-project.org mailing list

More information about the Bioc-devel mailing list