[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

Tirthankar Chakravarty tirthankar.lists at gmail.com
Sun Nov 5 16:39:08 CET 2017


Duncan, Daniel,

Thanks and indeed we intend to take the advice that Radford and Lukas have
provided in this thread.

I do want to re-iterate that the generating system itself cannot have any
conception of the use of form IDs as seeds for a PRNG *and* the system
itself only generates a sequence of form IDs, which are then filtered & are
passed to our API depending on basic rules on user inputs in that form.
Either in our production system a truly remarkable probability event has
happened or that the Mersenne-Twister is very susceptible to the first draw
in the sequence to be correlated across closely related seeds. Both of
these require understanding the Mersenne-Twister better.

The solution here as has been suggested is to use a different RNG with
adequate burn-in (in which case even MT would work) or to look more
carefully at our problem and understand if we just need a hash function.

In either case, we will cease to question R's implementation of
Mersenne-Twister (for the time being). :)

T



On Sun, Nov 5, 2017 at 7:47 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:

> On 04/11/2017 10:20 PM, Daniel Nordlund wrote:
>
>> Tirthankar,
>>
>> "random number generators" do not produce random numbers.  Any given
>> generator produces a fixed sequence of numbers that appear to meet
>> various tests of randomness.  By picking a seed you enter that sequence
>> in a particular place and subsequent numbers in the sequence appear to
>> be unrelated.  There are no guarantees that if YOU pick a SET of seeds
>> they won't produce a set of values that are of a similar magnitude.
>>
>> You can likely solve your problem by following Radford Neal's advice of
>> not using the the first number from each seed.  However, you don't need
>> to use anything more than the second number.  So, you can modify your
>> function as follows:
>>
>> function(x) {
>>         set.seed(x, kind = "default")
>>         y = runif(2, 17, 26)
>>         return(y[2])
>>       }
>>
>> Hope this is helpful,
>>
>
> That's assuming that the chosen seeds are unrelated to the function
> output, which seems unlikely on the face of it.  You can certainly choose a
> set of seeds that give high values on the second draw just as easily as you
> can choose seeds that give high draws on the first draw.
>
> The interesting thing about this problem is that Tirthankar doesn't
> believe that the seed selection process is aware of the function output.  I
> would say that it must be, and he should be investigating how that happens
> if he is worried about the output, he shouldn't be worrying about R's RNG.
>
> Duncan Murdoch
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list