[Rd] Which function can change RNG state?

Paul Gilbert pgilbert902 at gmail.com
Mon Feb 9 06:03:11 CET 2015


On 02/08/2015 09:33 AM, Dirk Eddelbuettel wrote:
>
> On 7 February 2015 at 19:52, otoomet wrote:
> | random numbers.   For instance, can I be sure that
> | set.seed(0); print(runif(1)); print(rnorm(1))
> | will always print the same numbers, also in the future version of R?  There
>
> Yes, pretty much.

This is nearly correct. The user could change the uniform or normal 
generator, since there are options other than the defaults, which would 
mean the result would be different. And obviously if they changed print 
precision then the printed result may be truncated differently.

I think you could prepare for future versions of R by saving information 
about the generators you are using. The precedent has already been set 
(R-1.7.0) that the default could change if there is a good reason. A 
good reason might be that the RNG is found not to be so good relative to 
others that become available. But I think the old generator would 
continue to be available, so people can reproduce old results. (Package 
setRNG has some utilities to help save and reset, but there is nothing 
especially difficult or fancy, just a few details that need to be 
remembered.)
>
> I've been lurking here over fifteen years, and while I am getting old and
> forgetful I can remember exactly one such change where behaviour was changed,
> and (one of the) generators was altered---if memory serves in the earlier
> days of R 1.* days . [ Goes digging...] Yes, see `help(RNGkind)` which
> details that R 1.7.0 made a change when "Buggy Kinderman-Ramage" was added as
> the old value, and "Kinderman-Ramage" was repaired.  There once was a similar
> fix in the very early days of the Mersenne-Twister which is why the GNU GSL
> has two variants with suffixes _1998 and _1998.

I seem to recall a bit of change around R-0.49 but old and forgetful 
would cover this too. For me, a bigger change was an unadvertised change 
in Splus - they compiled against a different math library at some point. 
This changed the lower bits in results, mostly insignificant but 
accumulated simulation results could amount to something fairly 
important. The amount of time I spent trying to find why results would 
not reproduce was one of my main motivations for starting to use R.
>
> So your issue seems like pilot error to me:  don't attach the parallel package
> if you do not plan to work in parallel.  But "do if you do", and see its fine
> vignette on how it provides you reproducibility for multiple RNG streams.
>
> In general, you can very much trust R (and R Core) in these matters.
>
> Dirk

On 02/08/2015 09:40 AM, Gábor Csárdi wrote:> On Sat, Feb 7, 2015 at
 > I don't know if there is intention to keep this reproducible across R
 > versions, but it is already not reproducible across platforms (with
 >the same R version):
 > 
http://stackoverflow.com/questions/21212326/floating-point-arithmetic-and-reproducibility

The situation is better in some respects, and worse in others, than what 
is described on stackoverflow. I think the point is made pretty well 
there that you should not be trying to reproduce results beyond machine 
precision. My experience is that you can compare within a fuzz of 1e-14 
usually, even across platforms. (The package setRNG on CRAN has a 
function random.number.test() which is run in the package's tests/ and 
makes uniform and normal comparisons to 1e-14. It has passed checks on 
all R platforms since 2004. Actual, the checks have been done since 
about 1995 but they were part of package dse earlier.)  If you 
accumulate lots of lower order parts (eg sum(simulated - true) in a long 
monte-carlo) then the fuzz may need to get much larger, especially 
comparing across platforms. And you will have trouble with numerically 
unstable calculations. Once-upon-a-time I was annoyed by this, but then 
I realized that it was better not to do unstable calculations.

In addition to not being reproducible beyond machine precision across R 
versions and across platforms, you can really not be guaranteed even on 
the same platform and same version of R. You may get different results 
if you upgrade the OS and there has been a change in the math libraries. 
In my experience this happens rather often. I don't think there is any 
specific 32 vs 64 bit issue, but math libraries sometimes do things a 
bit differently on different processors (eg processor bug fixes) so you 
can occasionally get differences with everything the same except the 
hardware.


On 02/07/2015 10:52 PM, otoomet wrote:
 > It turned out that this is because package "parallel", buried deep
 > in my dependencies, calls runif() during it's initialization and
 > in this way changes the random number sequence.

Guessing a bit about what you are saying: 1/you set the random seed 
2/you did some things which included loading package parallel 3/you ran 
some things for which you expected to get results comparable to some 
previous run when you did 1/ and 2/ in the reverse order.

If I understand this correctly, I suggest you always do everything 
exactly the same after you set the seed. There are lots of things that 
could generate random numbers without you really knowing. Thus, it is 
usually better to set the seed immediately before you start doing 
anything where you want the seed to have a known state. (There is an 
even better suggestion in the somewhat dated vignette with package setRNG.)

Finally, if you do intend to use parallel sometimes then you have 
additional considerations. You would like to get the same results no 
matter how many machines you are using. This may place some constraints 
on the generators you use, not all are equally easy to use in parallel. 
So if you are hoping to get the same results in parallel as you get on a 
single machine then you better start out using generators on the single 
machine that you will be able to use in parallel.

Paul



More information about the R-devel mailing list