[Bioc-devel] set.seed and BiocParallel

Aaron Lun |n||n|te@monkey@@w|th@keybo@rd@ @end|ng |rom gm@||@com
Wed Mar 13 03:36:41 CET 2019


I think Kylie is saying that she wants to use the same seed for each 
feature across different runs, but the seed can be different across 
features - which would make more sense.

Multi-worker reproducibility is an issue that we discussed before (the 
link goes into the middle of the thread):

https://stat.ethz.ch/pipermail/bioc-devel/2019-January/014505.html

The key thing is that, in addition to reproducibility, there is the 
issue of correctness with guaranteed independent streams.

Some food for thought: in the vast majority of my parallelized 
applications, the heavy lifting (including the RNG'ing) is done in C++. 
If this is also the case for you, consider using the dqrng package to 
provide the C++ PRNG. I usually generate all my seeds in the serial part 
of the code, and then distribute seeds to the jobs where each job is set 
to a different "stream" value so that the sequence of random numbers is 
always different, regardless of the seed. As the serial seed generation 
is under the control of set.seed(), this provides correctness and 
reproducibility no matter how the jobs are distributed across workers.

-A

On 12/03/2019 17:42, Kasper Daniel Hansen wrote:
> But why do you want the same seed for the different features? That is not
> the right way to use stochastic methods.
> 
> Best,
> Kasper
> 
> On Tue, Mar 12, 2019 at 5:20 PM Bemis, Kylie <k.bemis using northeastern.edu>
> wrote:
> 
>> Hi all,
>>
>> I remember similar questions coming up before, but couldn’t track any down
>> that directly pertain to my situation.
>>
>> Suppose I want to use bplapply() in a function to fit models to many
>> features, and I am applying over features. The models are stochastic, and I
>> want the results to be reproducible, and preferably use the same RNG seed
>> for each feature. So I could do:
>>
>> fitModels <- function(object, seed=1, BPPARAM=bpparam()) {
>> bplapply(object, function(x) {
>> set.seed(seed)
>> fitModel(x)
>> }, BPPARAM=BPPARAM)
>> }
>>
>> But the BioC guidelines say not to use set.seed() inside function code,
>> and I’ve seen other questions answered saying not to use “seed” as a
>> function parameter in this way.
>>
>> Is it preferable to check and modify .Random.seed directly, or is there
>> some other standard way of doing this?
>>
>> Thanks,
>> Kylie
>>
>> ~~~
>> Kylie Ariel Bemis
>> Khoury College of Computer Sciences
>> Northeastern University
>> kuwisdelu.github.io<https://kuwisdelu.github.io>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list