[Rd] A bug in the R Mersenne Twister (RNG) code?

Martin Maechler maechler at stat.math.ethz.ch
Tue Sep 6 09:21:08 CEST 2016


>>>>> Dirk Eddelbuettel <edd at debian.org>
>>>>>     on Wed, 31 Aug 2016 10:30:07 -0500 writes:

    > On 30 August 2016 at 18:29, Duncan Murdoch wrote: 
    > I don't see evidence of a bug.  There have been several
    > versions of the  MT; we may be using a different version
    > than you are.  Ours is the  1999/10/28 version; the web
    > page you cite uses one from 2002.
    >  
    >  Perhaps the newer version fixes some problems, and then
    > it would be  worth considering a change.  But changing
    > the default RNG definitely  introduces problems in
    > reproducibility, so it's not obvious that we would do it.

    > Yep. FWIW the GNU GSL adopted the 2002 version a while ago
    > too. Quoting from
    > https://www.gnu.org/software/gsl/manual/html_node/Random-number-generator-algorithms.html

    > Generator: gsl_rng_mt19937

    >    The MT19937 generator of Makoto Matsumoto and Takuji
    > Nishimura is a variant of the twisted generalized feedback
    > shift-register algorithm, and is known as the “Mersenne
    > Twister” generator. It has a Mersenne prime period of
    > 2^19937 - 1 (about 10^6000) and is equi-distributed in 623
    > dimensions. It has passed the DIEHARD statistical
    > tests. It uses 624 words of state per generator and is
    > comparable in speed to the other generators. The original
    > generator used a default seed of 4357 and choosing s equal
    > to zero in gsl_rng_set reproduces this. Later versions
    > switched to 5489 as the default seed, you can choose this
    > explicitly via gsl_rng_set instead if you require it.

    >    For more information see,

    >       Makoto Matsumoto and Takuji Nishimura, “Mersenne
    > Twister: A 623-dimensionally equidistributed uniform
    > pseudorandom number generator”. ACM Transactions on
    > Modeling and Computer Simulation, Vol. 8, No. 1
    > (Jan. 1998), Pages 3–30 The generator gsl_rng_mt19937 uses
    > the second revision of the seeding procedure published by
    > the two authors above in 2002. The original seeding
    > procedures could cause spurious artifacts for some seed
    > values. They are still available through the alternative
    > generators gsl_rng_mt19937_1999 and gsl_rng_mt19937_1998.

    > Note the last sentence here.

    > This is all somewhat technical code, so when I noticed the
    > above I could never figure what exactly R was doing in its
    > implementation.  But "innocent until proven guilty" -- a
    > sufficient number of people ought to have looked at this
    > -- so I saw no need to pursue this further.

    > Dirk

This interesting thread went to sleep a bit early.

Let me summarize my view (another R-core after Duncan's) on this:

a) R's reference documentation, easily accessed with one of
     ?Random, ?RNG, ?.Random.seed  (and more)
  clearly indicates the reference for our  "Mersenne-Twister" (MT)
  as the  ' Matsumoto, M. and Nishimura, T. (1998) ... ' publication.
  and we are providing that.
  As Duncan said, there is no bug.

b) The extra information by Mark and notably Dirk indicates that
   "the litterature" showed that the 1998 version of MT has rare
   problems and the GSL, notably uses the 2002 version of MT.

c) I think it would be nice if we could provide that as well as
   another RNGkind().  I for one would be willing to integrate a
   well written patch proposal (what others call PR for "pull
   request", and I'd also call "code donation") into the R sources.

   Well-written for me would include to re-use / share code with
   the 1998 version as much as possible.

d) If we had both kinds, say "Mersenne-Twister" and "Mersenne-Twister_2002",
   we (the R community, not R core necessarily) could conduct
   experiments about the differences... and can contempltate the
   pros and cons of a potential switch of default.

e) A bit tangential to this:  Nowadays, there also exist faster
   gamma variate RNGs .. giving different random values - for
   rgamma() but also several other derived RNGs, notably
   rchisq(), and even more in other CRAN packages.

   With code donation there, we could introduce a new argument
   	     'gamma.kind'
   to the
		set.seed(see, kind = NULL, normal.kind = NULL)
		RNGkind (     kind = NULL, normal.kind = NULL)
   functions.
   (There, currently I'd guess we would not change the default).

---

Note that the above includes my guess that  R-core would not take
action unless we get patch proposals.
(But then, I'm happy if my guess is wrong here ...)

Martin Maechler,
ETH Zurich



More information about the R-devel mailing list