[Bioc-devel] loading database package changes random number

Kasper Daniel Hansen k@@perd@n|e|h@n@en @end|ng |rom gm@||@com
Thu May 23 17:09:28 CEST 2019


It seems to me that BiocParallel should not touch the parallel stream at
load. I don't see why that is necessary, but ok, I just have a vague
understanding of why its doing it, so perhaps I am wrong.

On Thu, May 23, 2019 at 4:29 AM Steffi Grote <steffi_grote using eva.mpg.de>
wrote:

> Thank you Martin, that's exactly the problem.
> So for now I will just leave it like it is without setting a seed inside a
> function,
> and hope that the behaviour of DelayedArray might be updated. Anyway, I
> don't think it is a big problem.
>
> Best,
> Steffi
>
> > On May 22, 2019 at 5:02 PM Martin Morgan <mtmorgan.bioc using gmail.com>
> wrote:
> >
> >
> > I think the problem is that, even if the user were to set.seed(), it
> will have different consequences depending on whether DelayedArray is
> already loaded, or not yet loaded. DelayedArray gets loaded in some way
> that is not transparent to the user, as a dependency-of-a-dependency-of-an
> annotation package.
> >
> > I guess an acceptable solution would be for DelayedArray to remember and
> restore the random number seed before creating a BiocParallel param, with
> an edge case being that .Random.seed is NULL in a new R session.
> >
> > Martin
> >
> > On 5/22/19, 9:57 AM, "Kasper Daniel Hansen" <
> kasperdanielhansen using gmail.com> wrote:
> >
> >     Why don't you let this be under the user's control and not do this
> at all. People should know that reproducibility of random numbers requires
> setting the seed, but that is best done by the user and not a package
> author.
> >
> >     On Wed, May 22, 2019 at 9:30 AM Steffi Grote <
> steffi_grote using eva.mpg.de> wrote:
> >
> >
> >     Hi all,
> >
> >     I tried to circumvent the problem by adding an optional seed as
> parameter like this:
> >
> >     my_fun = function(..., seed = NULL){
> >
> >         code that might change the RNG
> >
> >         if (!is.null(seed)){
> >             set.seed(seed)
> >         }
> >
> >         code that runs permutations
> >     }
> >
> >     which solves the reproducibility issue, but gives me a Warning in
> BiocCheck:
> >         * WARNING: Remove set.seed usage in R code
> >           Found in R/ directory functions:
> >             my_fun()
> >
> >     What is the best way to deal with this?
> >
> >     Thanks in advance,
> >     Steffi
> >
> >
> >     > On April 12, 2019 at 1:10 AM Martin Morgan <
> mtmorgan.bioc using gmail.com> wrote:
> >     >
> >     >
> >     > That easy strategy wouldn't work, for instance two successive
> calls to MulticoreParam() would get the same port assigned, rather than the
> contract of a 'random' port in a specific range; the port can be assigned
> by the manager.port= argument if the user wants
> >      to avoid random assignment. I could maintain a separate random
> number stream in BiocParallel for what amounts to a pretty trivial and
> probably dubious strategy [choosing random ports in hopes that one is not
> in use], but that starts to sound like a more substantial
> >      feature.
> >     >
> >     > Martin
> >     >
> >     > On 4/11/19, 7:06 PM, "Pages, Herve" <hpages using fredhutch.org> wrote:
> >     >
> >     >     Hi Steffi,
> >     >
> >     >     Any code that gets called between your calls to set.seed() and
> runif()
> >     >     could potentially use the random number generator. So the
> sequence
> >     >     set.seed(123); runif(1) is only guaranteed to be deterministic
> if no
> >     >     other code is called in between, or if the code called in
> between does
> >     >     not use the random number generator (but if that code is not
> under your
> >     >     control it could do anything).
> >     >
> >     >     @Martin: I'll look at your suggestion for DelayedArray. An
> easy
> >     >     workaround would be to avoid changing the RNG state in
> BiocParallel by
> >     >     having .snowPort() make a copy of .Random.seed (if it exists)
> before
> >     >     calling runif() and restoring it on exit.
> >     >
> >     >     H.
> >     >
> >     >     On 4/11/19 15:25, Martin Morgan wrote:
> >     >     > This is actually from a dependency DelayedArray which, on
> load, calls DelayedArray::setAutoBPPARAM, which calls
> BiocParallel::MulticoreParam(), which uses the random number generator to
> select a random port for connection.
> >     >     >
> >     >     > A different approach would be for DelayedArray to respect
> the user's configuration and use bpparam(), or perhaps look at the class of
> bpparam() and tell the user they should, e.g.,
> BiocParallel::register(SerialParam()) if that's appropriate, or use
> >      registered("MulticoreParam") or registered("SerialParam") if
> available (they are by default) rather than creating an ad-hoc instance.
> >     >     >
> >     >     > Martin
> >     >     >
> >     >     > On 4/11/19, 10:17 AM, "Bioc-devel on behalf of Steffi
> Grote" <bioc-devel-bounces using r-project.org on behalf of
> >     steffi_grote using eva.mpg.de> wrote:
> >     >     >
> >     >     >      Hi all,
> >     >     >      I found out that example code for my package GOfuncR
> yields a different result the first time it's executed, despite setting a
> seed. All the following executions are identical.
> >     >     >      It turned out that loading the database package
> 'Homo.sapiens' changed the random numbers:
> >     >     >
> >     >     >      set.seed(123)
> >     >     >      runif(1)
> >     >     >      # [1] 0.2875775
> >     >     >
> >     >     >      set.seed(123)
> >     >     >
> suppressWarnings(suppressMessages(require(Homo.sapiens)))
> >     >     >      runif(1)
> >     >     >      # [1] 0.7883051
> >     >     >
> >     >     >      set.seed(123)
> >     >     >      runif(1)
> >     >     >      # [1] 0.2875775
> >     >     >
> >     >     >      Is that known or expected behaviour?
> >     >     >      Should I not load a package inside a function that
> later uses random numbers?
> >     >     >
> >     >     >      Thanks in advance,
> >     >     >      Steffi
> >     >     >
> >     >     >      _______________________________________________
> >     >     >      Bioc-devel using r-project.org mailing list
> >     >     >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8XXamcpEeef966i7IGk_3aE9GMJodKAzXwWW4fL_hrI&s=KoHGLM0HbP4whRZLG4ol66_q1qkg9E0LjFHObDqgNuo&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8XXamcpEeef966i7IGk_3aE9GMJodKAzXwWW4fL_hrI&s=KoHGLM0HbP4whRZLG4ol66_q1qkg9E0LjFHObDqgNuo&e=
> >
> >     >     >
> >     >
> >     >     --
> >     >     Hervé Pagès
> >     >
> >     >     Program in Computational Biology
> >     >     Division of Public Health Sciences
> >     >     Fred Hutchinson Cancer Research Center
> >     >     1100 Fairview Ave. N, M1-B514
> >     >     P.O. Box 19024
> >     >     Seattle, WA 98109-1024
> >     >
> >     >     E-mail: hpages using fredhutch.org
> >     >     Phone:  (206) 667-5791
> >     >     Fax:    (206) 667-1319
> >     >
> >     >
> >
> >     _______________________________________________
> >     Bioc-devel using r-project.org mailing list
> >     https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
> >
> >
> >
> >
> >     --
> >     Best,
> >     Kasper
> >
> >
> >
>


-- 
Best,
Kasper

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list