[Bioc-devel] Using SerialParam() as the registered back-end for all platforms

Tue Jan 8 16:13:08 CET 2019

On Mon, Jan 7, 2019 at 3:26 PM Henrik Bengtsson <henrik.bengtsson using gmail.com>
wrote:

>
> 1. To achieve fully numerically reproducible RNGs in way that is
> *invariant to the number of workers* (amount of chunking), I think the
> only solution is to pregenerated RNG seeds (using
> parallel::nextRNGStream()) for each individual iteration (element).
> In other words, if a worker will process K elements, then the main R
> process needs to generate K RNG seeds and pass those along to the
> work.  I use this approach for future.apply::future_lapply(...,
> future.seed = TRUE/<initial_seed>), which then produce identical RNG
> results regardless of backend and amount of chunking.  In the past, I
> think I've seen Martin suggesting something similar as a manual
> approach to some users.
>
> 2. The above approach is obviously expensive, especially when there
> are a large number of elements to iterate over.  Because of this I'm
> thinking providing an option to use only one RNG seed per worker
> (which is the common approach used elsewhere)
> [https://github.com/HenrikBengtsson/future.apply/issues/20].  This
> won't be invariant to the number of workers, but it "should" still be
> statistically sound.  This approach will give reproducible RNG results
> given the same initial seed and the same amount of chunking.
>
> 3. For algorithms which do not rely on RNG, we can ignore both of the
> above.  The problem is that it's not always known to the
> user/developer which methods depend on RNG or not.  The above 'RNG
> tracker' helps to identify some, but things might also change over
> time.  I believe there's room for automating this in one way or the
> other.  For instance, having a way to declare a function being
> dependent on RNG or not could help.  Static code inspection could also
> do it, e.g. when an R package is built and it could be part of the R
> CMD checks to validate.
>
> 4. Are there other approaches?
>

I don't suppose it's possible to quickly determine via static analysis
whether a piece of code uses the RNG?

	[[alternative HTML version deleted]]