[Bioc-devel] loading database package changes random number
Kasper Daniel Hansen
k@@perd@n|e|h@n@en @end|ng |rom gm@||@com
Thu May 23 17:09:28 CEST 2019
It seems to me that BiocParallel should not touch the parallel stream at
load. I don't see why that is necessary, but ok, I just have a vague
understanding of why its doing it, so perhaps I am wrong.
On Thu, May 23, 2019 at 4:29 AM Steffi Grote <steffi_grote using eva.mpg.de>
wrote:
> Thank you Martin, that's exactly the problem.
> So for now I will just leave it like it is without setting a seed inside a
> function,
> and hope that the behaviour of DelayedArray might be updated. Anyway, I
> don't think it is a big problem.
>
> Best,
> Steffi
>
> > On May 22, 2019 at 5:02 PM Martin Morgan <mtmorgan.bioc using gmail.com>
> wrote:
> >
> >
> > I think the problem is that, even if the user were to set.seed(), it
> will have different consequences depending on whether DelayedArray is
> already loaded, or not yet loaded. DelayedArray gets loaded in some way
> that is not transparent to the user, as a dependency-of-a-dependency-of-an
> annotation package.
> >
> > I guess an acceptable solution would be for DelayedArray to remember and
> restore the random number seed before creating a BiocParallel param, with
> an edge case being that .Random.seed is NULL in a new R session.
> >
> > Martin
> >
> > On 5/22/19, 9:57 AM, "Kasper Daniel Hansen" <
> kasperdanielhansen using gmail.com> wrote:
> >
> > Why don't you let this be under the user's control and not do this
> at all. People should know that reproducibility of random numbers requires
> setting the seed, but that is best done by the user and not a package
> author.
> >
> > On Wed, May 22, 2019 at 9:30 AM Steffi Grote <
> steffi_grote using eva.mpg.de> wrote:
> >
> >
> > Hi all,
> >
> > I tried to circumvent the problem by adding an optional seed as
> parameter like this:
> >
> > my_fun = function(..., seed = NULL){
> >
> > code that might change the RNG
> >
> > if (!is.null(seed)){
> > set.seed(seed)
> > }
> >
> > code that runs permutations
> > }
> >
> > which solves the reproducibility issue, but gives me a Warning in
> BiocCheck:
> > * WARNING: Remove set.seed usage in R code
> > Found in R/ directory functions:
> > my_fun()
> >
> > What is the best way to deal with this?
> >
> > Thanks in advance,
> > Steffi
> >
> >
> > > On April 12, 2019 at 1:10 AM Martin Morgan <
> mtmorgan.bioc using gmail.com> wrote:
> > >
> > >
> > > That easy strategy wouldn't work, for instance two successive
> calls to MulticoreParam() would get the same port assigned, rather than the
> contract of a 'random' port in a specific range; the port can be assigned
> by the manager.port= argument if the user wants
> > to avoid random assignment. I could maintain a separate random
> number stream in BiocParallel for what amounts to a pretty trivial and
> probably dubious strategy [choosing random ports in hopes that one is not
> in use], but that starts to sound like a more substantial
> > feature.
> > >
> > > Martin
> > >
> > > On 4/11/19, 7:06 PM, "Pages, Herve" <hpages using fredhutch.org> wrote:
> > >
> > > Hi Steffi,
> > >
> > > Any code that gets called between your calls to set.seed() and
> runif()
> > > could potentially use the random number generator. So the
> sequence
> > > set.seed(123); runif(1) is only guaranteed to be deterministic
> if no
> > > other code is called in between, or if the code called in
> between does
> > > not use the random number generator (but if that code is not
> under your
> > > control it could do anything).
> > >
> > > @Martin: I'll look at your suggestion for DelayedArray. An
> easy
> > > workaround would be to avoid changing the RNG state in
> BiocParallel by
> > > having .snowPort() make a copy of .Random.seed (if it exists)
> before
> > > calling runif() and restoring it on exit.
> > >
> > > H.
> > >
> > > On 4/11/19 15:25, Martin Morgan wrote:
> > > > This is actually from a dependency DelayedArray which, on
> load, calls DelayedArray::setAutoBPPARAM, which calls
> BiocParallel::MulticoreParam(), which uses the random number generator to
> select a random port for connection.
> > > >
> > > > A different approach would be for DelayedArray to respect
> the user's configuration and use bpparam(), or perhaps look at the class of
> bpparam() and tell the user they should, e.g.,
> BiocParallel::register(SerialParam()) if that's appropriate, or use
> > registered("MulticoreParam") or registered("SerialParam") if
> available (they are by default) rather than creating an ad-hoc instance.
> > > >
> > > > Martin
> > > >
> > > > On 4/11/19, 10:17 AM, "Bioc-devel on behalf of Steffi
> Grote" <bioc-devel-bounces using r-project.org on behalf of
> > steffi_grote using eva.mpg.de> wrote:
> > > >
> > > > Hi all,
> > > > I found out that example code for my package GOfuncR
> yields a different result the first time it's executed, despite setting a
> seed. All the following executions are identical.
> > > > It turned out that loading the database package
> 'Homo.sapiens' changed the random numbers:
> > > >
> > > > set.seed(123)
> > > > runif(1)
> > > > # [1] 0.2875775
> > > >
> > > > set.seed(123)
> > > >
> suppressWarnings(suppressMessages(require(Homo.sapiens)))
> > > > runif(1)
> > > > # [1] 0.7883051
> > > >
> > > > set.seed(123)
> > > > runif(1)
> > > > # [1] 0.2875775
> > > >
> > > > Is that known or expected behaviour?
> > > > Should I not load a package inside a function that
> later uses random numbers?
> > > >
> > > > Thanks in advance,
> > > > Steffi
> > > >
> > > > _______________________________________________
> > > > Bioc-devel using r-project.org mailing list
> > > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8XXamcpEeef966i7IGk_3aE9GMJodKAzXwWW4fL_hrI&s=KoHGLM0HbP4whRZLG4ol66_q1qkg9E0LjFHObDqgNuo&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8XXamcpEeef966i7IGk_3aE9GMJodKAzXwWW4fL_hrI&s=KoHGLM0HbP4whRZLG4ol66_q1qkg9E0LjFHObDqgNuo&e=
> >
> > > >
> > >
> > > --
> > > Hervé Pagès
> > >
> > > Program in Computational Biology
> > > Division of Public Health Sciences
> > > Fred Hutchinson Cancer Research Center
> > > 1100 Fairview Ave. N, M1-B514
> > > P.O. Box 19024
> > > Seattle, WA 98109-1024
> > >
> > > E-mail: hpages using fredhutch.org
> > > Phone: (206) 667-5791
> > > Fax: (206) 667-1319
> > >
> > >
> >
> > _______________________________________________
> > Bioc-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
> >
> >
> >
> >
> > --
> > Best,
> > Kasper
> >
> >
> >
>
--
Best,
Kasper
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list