[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed
Tirthankar Chakravarty
tirthankar.lists at gmail.com
Sun Nov 5 16:39:08 CET 2017
Duncan, Daniel,
Thanks and indeed we intend to take the advice that Radford and Lukas have
provided in this thread.
I do want to re-iterate that the generating system itself cannot have any
conception of the use of form IDs as seeds for a PRNG *and* the system
itself only generates a sequence of form IDs, which are then filtered & are
passed to our API depending on basic rules on user inputs in that form.
Either in our production system a truly remarkable probability event has
happened or that the Mersenne-Twister is very susceptible to the first draw
in the sequence to be correlated across closely related seeds. Both of
these require understanding the Mersenne-Twister better.
The solution here as has been suggested is to use a different RNG with
adequate burn-in (in which case even MT would work) or to look more
carefully at our problem and understand if we just need a hash function.
In either case, we will cease to question R's implementation of
Mersenne-Twister (for the time being). :)
T
On Sun, Nov 5, 2017 at 7:47 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
> On 04/11/2017 10:20 PM, Daniel Nordlund wrote:
>
>> Tirthankar,
>>
>> "random number generators" do not produce random numbers. Any given
>> generator produces a fixed sequence of numbers that appear to meet
>> various tests of randomness. By picking a seed you enter that sequence
>> in a particular place and subsequent numbers in the sequence appear to
>> be unrelated. There are no guarantees that if YOU pick a SET of seeds
>> they won't produce a set of values that are of a similar magnitude.
>>
>> You can likely solve your problem by following Radford Neal's advice of
>> not using the the first number from each seed. However, you don't need
>> to use anything more than the second number. So, you can modify your
>> function as follows:
>>
>> function(x) {
>> set.seed(x, kind = "default")
>> y = runif(2, 17, 26)
>> return(y[2])
>> }
>>
>> Hope this is helpful,
>>
>
> That's assuming that the chosen seeds are unrelated to the function
> output, which seems unlikely on the face of it. You can certainly choose a
> set of seeds that give high values on the second draw just as easily as you
> can choose seeds that give high draws on the first draw.
>
> The interesting thing about this problem is that Tirthankar doesn't
> believe that the seed selection process is aware of the function output. I
> would say that it must be, and he should be investigating how that happens
> if he is worried about the output, he shouldn't be worrying about R's RNG.
>
> Duncan Murdoch
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list