[Bioc-devel] loading database package changes random number

Kasper Daniel Hansen k@@perd@n|e|h@n@en @end|ng |rom gm@||@com
Wed May 22 15:57:13 CEST 2019


Why don't you let this be under the user's control and not do this at all.
People should know that reproducibility of random numbers requires setting
the seed, but that is best done by the user and not a package author.

On Wed, May 22, 2019 at 9:30 AM Steffi Grote <steffi_grote using eva.mpg.de>
wrote:

> Hi all,
>
> I tried to circumvent the problem by adding an optional seed as parameter
> like this:
>
> my_fun = function(..., seed = NULL){
>
>     code that might change the RNG
>
>     if (!is.null(seed)){
>         set.seed(seed)
>     }
>
>     code that runs permutations
> }
>
> which solves the reproducibility issue, but gives me a Warning in
> BiocCheck:
>     * WARNING: Remove set.seed usage in R code
>       Found in R/ directory functions:
>         my_fun()
>
> What is the best way to deal with this?
>
> Thanks in advance,
> Steffi
>
>
> > On April 12, 2019 at 1:10 AM Martin Morgan <mtmorgan.bioc using gmail.com>
> wrote:
> >
> >
> > That easy strategy wouldn't work, for instance two successive calls to
> MulticoreParam() would get the same port assigned, rather than the contract
> of a 'random' port in a specific range; the port can be assigned by the
> manager.port= argument if the user wants to avoid random assignment. I
> could maintain a separate random number stream in BiocParallel for what
> amounts to a pretty trivial and probably dubious strategy [choosing random
> ports in hopes that one is not in use], but that starts to sound like a
> more substantial feature.
> >
> > Martin
> >
> > On 4/11/19, 7:06 PM, "Pages, Herve" <hpages using fredhutch.org> wrote:
> >
> >     Hi Steffi,
> >
> >     Any code that gets called between your calls to set.seed() and
> runif()
> >     could potentially use the random number generator. So the sequence
> >     set.seed(123); runif(1) is only guaranteed to be deterministic if no
> >     other code is called in between, or if the code called in between
> does
> >     not use the random number generator (but if that code is not under
> your
> >     control it could do anything).
> >
> >     @Martin: I'll look at your suggestion for DelayedArray. An easy
> >     workaround would be to avoid changing the RNG state in BiocParallel
> by
> >     having .snowPort() make a copy of .Random.seed (if it exists) before
> >     calling runif() and restoring it on exit.
> >
> >     H.
> >
> >     On 4/11/19 15:25, Martin Morgan wrote:
> >     > This is actually from a dependency DelayedArray which, on load,
> calls DelayedArray::setAutoBPPARAM, which calls
> BiocParallel::MulticoreParam(), which uses the random number generator to
> select a random port for connection.
> >     >
> >     > A different approach would be for DelayedArray to respect the
> user's configuration and use bpparam(), or perhaps look at the class of
> bpparam() and tell the user they should, e.g.,
> BiocParallel::register(SerialParam()) if that's appropriate, or use
> registered("MulticoreParam") or registered("SerialParam") if available
> (they are by default) rather than creating an ad-hoc instance.
> >     >
> >     > Martin
> >     >
> >     > On 4/11/19, 10:17 AM, "Bioc-devel on behalf of Steffi Grote" <
> bioc-devel-bounces using r-project.org on behalf of steffi_grote using eva.mpg.de>
> wrote:
> >     >
> >     >      Hi all,
> >     >      I found out that example code for my package GOfuncR yields a
> different result the first time it's executed, despite setting a seed. All
> the following executions are identical.
> >     >      It turned out that loading the database package
> 'Homo.sapiens' changed the random numbers:
> >     >
> >     >      set.seed(123)
> >     >      runif(1)
> >     >      # [1] 0.2875775
> >     >
> >     >      set.seed(123)
> >     >      suppressWarnings(suppressMessages(require(Homo.sapiens)))
> >     >      runif(1)
> >     >      # [1] 0.7883051
> >     >
> >     >      set.seed(123)
> >     >      runif(1)
> >     >      # [1] 0.2875775
> >     >
> >     >      Is that known or expected behaviour?
> >     >      Should I not load a package inside a function that later uses
> random numbers?
> >     >
> >     >      Thanks in advance,
> >     >      Steffi
> >     >
> >     >      _______________________________________________
> >     >      Bioc-devel using r-project.org mailing list
> >     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8XXamcpEeef966i7IGk_3aE9GMJodKAzXwWW4fL_hrI&s=KoHGLM0HbP4whRZLG4ol66_q1qkg9E0LjFHObDqgNuo&e=
> >     >
> >
> >     --
> >     Hervé Pagès
> >
> >     Program in Computational Biology
> >     Division of Public Health Sciences
> >     Fred Hutchinson Cancer Research Center
> >     1100 Fairview Ave. N, M1-B514
> >     P.O. Box 19024
> >     Seattle, WA 98109-1024
> >
> >     E-mail: hpages using fredhutch.org
> >     Phone:  (206) 667-5791
> >     Fax:    (206) 667-1319
> >
> >
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Best,
Kasper

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list