[Rd] Which function can change RNG state?
Paul Gilbert
pgilbert902 at gmail.com
Mon Feb 9 06:03:11 CET 2015
On 02/08/2015 09:33 AM, Dirk Eddelbuettel wrote:
>
> On 7 February 2015 at 19:52, otoomet wrote:
> | random numbers. For instance, can I be sure that
> | set.seed(0); print(runif(1)); print(rnorm(1))
> | will always print the same numbers, also in the future version of R? There
>
> Yes, pretty much.
This is nearly correct. The user could change the uniform or normal
generator, since there are options other than the defaults, which would
mean the result would be different. And obviously if they changed print
precision then the printed result may be truncated differently.
I think you could prepare for future versions of R by saving information
about the generators you are using. The precedent has already been set
(R-1.7.0) that the default could change if there is a good reason. A
good reason might be that the RNG is found not to be so good relative to
others that become available. But I think the old generator would
continue to be available, so people can reproduce old results. (Package
setRNG has some utilities to help save and reset, but there is nothing
especially difficult or fancy, just a few details that need to be
remembered.)
>
> I've been lurking here over fifteen years, and while I am getting old and
> forgetful I can remember exactly one such change where behaviour was changed,
> and (one of the) generators was altered---if memory serves in the earlier
> days of R 1.* days . [ Goes digging...] Yes, see `help(RNGkind)` which
> details that R 1.7.0 made a change when "Buggy Kinderman-Ramage" was added as
> the old value, and "Kinderman-Ramage" was repaired. There once was a similar
> fix in the very early days of the Mersenne-Twister which is why the GNU GSL
> has two variants with suffixes _1998 and _1998.
I seem to recall a bit of change around R-0.49 but old and forgetful
would cover this too. For me, a bigger change was an unadvertised change
in Splus - they compiled against a different math library at some point.
This changed the lower bits in results, mostly insignificant but
accumulated simulation results could amount to something fairly
important. The amount of time I spent trying to find why results would
not reproduce was one of my main motivations for starting to use R.
>
> So your issue seems like pilot error to me: don't attach the parallel package
> if you do not plan to work in parallel. But "do if you do", and see its fine
> vignette on how it provides you reproducibility for multiple RNG streams.
>
> In general, you can very much trust R (and R Core) in these matters.
>
> Dirk
On 02/08/2015 09:40 AM, Gábor Csárdi wrote:> On Sat, Feb 7, 2015 at
> I don't know if there is intention to keep this reproducible across R
> versions, but it is already not reproducible across platforms (with
>the same R version):
>
http://stackoverflow.com/questions/21212326/floating-point-arithmetic-and-reproducibility
The situation is better in some respects, and worse in others, than what
is described on stackoverflow. I think the point is made pretty well
there that you should not be trying to reproduce results beyond machine
precision. My experience is that you can compare within a fuzz of 1e-14
usually, even across platforms. (The package setRNG on CRAN has a
function random.number.test() which is run in the package's tests/ and
makes uniform and normal comparisons to 1e-14. It has passed checks on
all R platforms since 2004. Actual, the checks have been done since
about 1995 but they were part of package dse earlier.) If you
accumulate lots of lower order parts (eg sum(simulated - true) in a long
monte-carlo) then the fuzz may need to get much larger, especially
comparing across platforms. And you will have trouble with numerically
unstable calculations. Once-upon-a-time I was annoyed by this, but then
I realized that it was better not to do unstable calculations.
In addition to not being reproducible beyond machine precision across R
versions and across platforms, you can really not be guaranteed even on
the same platform and same version of R. You may get different results
if you upgrade the OS and there has been a change in the math libraries.
In my experience this happens rather often. I don't think there is any
specific 32 vs 64 bit issue, but math libraries sometimes do things a
bit differently on different processors (eg processor bug fixes) so you
can occasionally get differences with everything the same except the
hardware.
On 02/07/2015 10:52 PM, otoomet wrote:
> It turned out that this is because package "parallel", buried deep
> in my dependencies, calls runif() during it's initialization and
> in this way changes the random number sequence.
Guessing a bit about what you are saying: 1/you set the random seed
2/you did some things which included loading package parallel 3/you ran
some things for which you expected to get results comparable to some
previous run when you did 1/ and 2/ in the reverse order.
If I understand this correctly, I suggest you always do everything
exactly the same after you set the seed. There are lots of things that
could generate random numbers without you really knowing. Thus, it is
usually better to set the seed immediately before you start doing
anything where you want the seed to have a known state. (There is an
even better suggestion in the somewhat dated vignette with package setRNG.)
Finally, if you do intend to use parallel sometimes then you have
additional considerations. You would like to get the same results no
matter how many machines you are using. This may place some constraints
on the generators you use, not all are equally easy to use in parallel.
So if you are hoping to get the same results in parallel as you get on a
single machine then you better start out using generators on the single
machine that you will be able to use in parallel.
Paul
More information about the R-devel
mailing list