[Bioc-devel] set.seed and BiocParallel
Martin Morgan
mtmorg@n@b|oc @end|ng |rom gm@||@com
Wed Mar 13 10:20:45 CET 2019
Aaron points in the right direction with generating random number streams in the serial part of the program, then sending these to the workers in a consistent way. Use ?nextRNGStream to generate the streams for each replicate, and .Random.seed on the thread. Probably this generates a BiocCheck warning, but so long as the top-level generation of streams on the manager is under control of the user (e.g., there is no need for your function to call `set.seed()`; if the user wants reproducibility they can do that themselves in their own code) this will be ok.
Martin
On 3/12/19, 10:37 PM, "Bioc-devel on behalf of Aaron Lun" <bioc-devel-bounces using r-project.org on behalf of infinite.monkeys.with.keyboards using gmail.com> wrote:
I think Kylie is saying that she wants to use the same seed for each
feature across different runs, but the seed can be different across
features - which would make more sense.
Multi-worker reproducibility is an issue that we discussed before (the
link goes into the middle of the thread):
https://stat.ethz.ch/pipermail/bioc-devel/2019-January/014505.html
The key thing is that, in addition to reproducibility, there is the
issue of correctness with guaranteed independent streams.
Some food for thought: in the vast majority of my parallelized
applications, the heavy lifting (including the RNG'ing) is done in C++.
If this is also the case for you, consider using the dqrng package to
provide the C++ PRNG. I usually generate all my seeds in the serial part
of the code, and then distribute seeds to the jobs where each job is set
to a different "stream" value so that the sequence of random numbers is
always different, regardless of the seed. As the serial seed generation
is under the control of set.seed(), this provides correctness and
reproducibility no matter how the jobs are distributed across workers.
-A
On 12/03/2019 17:42, Kasper Daniel Hansen wrote:
> But why do you want the same seed for the different features? That is not
> the right way to use stochastic methods.
>
> Best,
> Kasper
>
> On Tue, Mar 12, 2019 at 5:20 PM Bemis, Kylie <k.bemis using northeastern.edu>
> wrote:
>
>> Hi all,
>>
>> I remember similar questions coming up before, but couldn’t track any down
>> that directly pertain to my situation.
>>
>> Suppose I want to use bplapply() in a function to fit models to many
>> features, and I am applying over features. The models are stochastic, and I
>> want the results to be reproducible, and preferably use the same RNG seed
>> for each feature. So I could do:
>>
>> fitModels <- function(object, seed=1, BPPARAM=bpparam()) {
>> bplapply(object, function(x) {
>> set.seed(seed)
>> fitModel(x)
>> }, BPPARAM=BPPARAM)
>> }
>>
>> But the BioC guidelines say not to use set.seed() inside function code,
>> and I’ve seen other questions answered saying not to use “seed” as a
>> function parameter in this way.
>>
>> Is it preferable to check and modify .Random.seed directly, or is there
>> some other standard way of doing this?
>>
>> Thanks,
>> Kylie
>>
>> ~~~
>> Kylie Ariel Bemis
>> Khoury College of Computer Sciences
>> Northeastern University
>> kuwisdelu.github.io<https://kuwisdelu.github.io>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
_______________________________________________
Bioc-devel using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list