[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

Lukas Stadler lukas.stadler at oracle.com
Fri Nov 3 11:04:59 CET 2017


If I interpret the original message as “I think there’s something wrong with R's random number generator”:
Your assumption is that going from the seed to the first random number is a good hash function, which it isn’t.
E.g., with Mersenne Twister it’s a couple of multiplications, bit shifts, xors and ands, and the few bits that vary in your seed end up in the less significant bits of the result.
Something like the “digest” package might be what you want, it provides proper hash functions.

- Lukas

> On 3 Nov 2017, at 10:39, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
> 
>>>>>> Tirthankar Chakravarty <tirthankar.lists at gmail.com>
>>>>>>    on Fri, 3 Nov 2017 13:19:12 +0530 writes:
> 
>> This is cross-posted from SO
>> (https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_q_47079702_1414455&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=sySSOv_y4gUrdhItlSw7q2z3RRR8JsPrnS8RhIHA9W4&m=mDEuT7697Im9mtm3dqOQF3Abpcn1ZsA1E_sZE-PZIGg&s=qm177vnypIq1tc3Km5gwocAEmlwieB9pD5jkClG0I-U&e=), but I now
>> feel that this needs someone from R-Devel to help
>> understand why this is happening.
> 
> Why R-devel -- R-help would have been appropriate:
> 
> It seems you have not read the help page for
> set.seed as I expect it from posters to R-devel. 
> Why would you use strings instead of integers if you *had* read it ?
> 
>> We are facing a weird situation in our code when using R's
>> [`runif`][1] and setting seed with `set.seed` with the
>> `kind = NULL` option (which resolves, unless I am
>> mistaken, to `kind = "default"`; the default being
>> `"Mersenne-Twister"`).
> 
> again this is not what the help page says; rather
> 
> | The use of ‘kind = NULL’ or ‘normal.kind = NULL’ in ‘RNGkind’ or
> | ‘set.seed’ selects the currently-used generator (including that
> | used in the previous session if the workspace has been restored):
> | if no generator has been used it selects ‘"default"’.
> 
> but as you have > 90 (!!) packages in your sessionInfo() below,
> why should we (or you) know if some of the things you did
> before or (implicitly) during loading all these packages did not
> change the RNG kind ?
> 
>> We set the seed using (8 digit) unique IDs generated by an
>> upstream system, before calling `runif`:
> 
>>    seeds = c( "86548915", "86551615", "86566163",
>> "86577411", "86584144", "86584272", "86620568",
>> "86724613", "86756002", "86768593", "86772411",
>> "86781516", "86794389", "86805854", "86814600",
>> "86835092", "86874179", "86876466", "86901193",
>> "86987847", "86988080")
> 
>> random_values = sapply(seeds, function(x) {
>>  set.seed(x)
>>  y = runif(1, 17, 26)
>>  return(y)
>> })
> 
> Why do you do that?
> 
> 1) You should set the seed *once*, not multiple times in one simulation.
> 
> 2) Assuming that your strings are correctly translated to integers
>   and the same on all platforms, independent of locales (!) etc,
>   you are again not following the simple instruction on the help page:
> 
>     ‘set.seed’ uses a single integer argument to set as many seeds as
>     are required.  It is intended as a simple way to get quite
>     different seeds by specifying small integer arguments, and also as
>     .....
>     .....
> 
> Note:   ** small ** integer 
> Why do you assume   86901193  to be a small integer ?
> 
>> This gives values that are **extremely** bunched together.
> 
>>> summary(random_values)
>>       Min. 1st Qu.  Median Mean 3rd Qu.  Max.  25.13
>> 25.36 25.66 25.58 25.83 25.94
> 
>> This behaviour of `runif` goes away when we use `kind =
>> "Knuth-TAOCP-2002"`, and we get values that appear to be
>> much more evenly spread out.
> 
>>    random_values = sapply(seeds, function(x) {
>> set.seed(x, kind = "Knuth-TAOCP-2002") y = runif(1, 17,
>> 26) return(y) })
> 
>> *Output omitted.*
> 
>> ---
> 
>> **The most interesting thing here is that this does not
>> happen on Windows -- only happens on Ubuntu**
>> (`sessionInfo` output for Ubuntu & Windows below).
> 
>> # Windows output: #
> 
>>> seeds = c(
>>    + "86548915", "86551615", "86566163", "86577411",
>> "86584144", + "86584272", "86620568", "86724613",
>> "86756002", "86768593", "86772411", + "86781516",
>> "86794389", "86805854", "86814600", "86835092",
>> "86874179", + "86876466", "86901193", "86987847",
>> "86988080")
>>> 
>>> random_values = sapply(seeds, function(x) {
>>    + set.seed(x) + y = runif(1, 17, 26) + return(y) + })
>>> 
>>> summary(random_values)
>>       Min. 1st Qu.  Median Mean 3rd Qu.  Max.  17.32
>> 20.14 23.00 22.17 24.07 25.90
> 
>> Can someone help understand what is going on?
> 
>> Ubuntu
>> ------
> 
>> R version 3.4.0 (2017-04-21)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 16.04.2 LTS
> 
> You have not learned to get a current version of R.
> ===> You should not write to R-devel (sorry if this may sound harsh ..)
> 
> Hint:
>   We know that  Ubuntu LTS -- by its virtue of LTS (Long Time
>   Support) will not update R.
>   But the Ubuntu/Debian pages on CRAN tell you how to ensure to
>   automatically get current versions of R on your ubuntu-run computer
>   (Namely by adding a CRAN mirror to your ubuntu sources)
> 
> And then in your sessionInfo :
> 
>    ....
>       38 packages attached + 56 namespaces loaded !!
>    ....
> 
>   and similar nonsense (tons of packages+namespaces)
>   on Windows which uses an even more outdated version of
>   R 3.3.2.
> 
> -------------
> 
> Can you please learn to work with a minimal reproducible example MRE
> (well you are close in your R code, but not if you load 50
> packages and do how-knows-what before running the example,
> you RNGkind() and many other things could have been changed ...)
> 
> Since you run ubuntu, you know the shell and you could
> (after installing a current version of R) put your MRE in a
> small *.R script and do
> 
>   R CMD BATCH --vanilla  MRE.R
> 
> which will produce MRE.Rout  with all input/output
> 
> BTW: Even on Windoze you can do similarly, once you've found the
> location of 'Rcmd.exe':
> 
>   ......\Rcmd BATCH --vanilla MRE.R
> 
> should work there as well and deliver MRE.Rout
> 
> - - - - -
> After doing all this, your problem may still be just
> because you are using much too large integers for the 'seed'
> argument of set.seed()
> 
> I really really strongly believe you should have used R-help
> instead of R-devel.
> 
> Best,
> Martin Maechler
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=sySSOv_y4gUrdhItlSw7q2z3RRR8JsPrnS8RhIHA9W4&m=mDEuT7697Im9mtm3dqOQF3Abpcn1ZsA1E_sZE-PZIGg&s=ua3fUgGQ4bG_ImAKJ-_AHRdtFz0xtqvoA--cKTvFI1Q&e=



More information about the R-devel mailing list