[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

William Dunlap wdunlap at tibco.com
Fri Nov 3 17:28:37 CET 2017


The random numbers in a stream initialized with one seed should have about
the desired distribution.  You don't win by changing the seed all the
time.  Your seeds caused the first numbers of a bunch of streams to be
about the same, but the second and subsequent entries in each stream do
look uniformly distributed.

You didn't say what your 'upstream process' was, but it is easy to come up
with seeds that give about the same first value:

> Filter(function(s){set.seed(s);runif(1,17,26)>25.99}, 1:10000)
 [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
tirthankar.lists at gmail.com> wrote:

> This is cross-posted from SO (https://stackoverflow.com/q/47079702/1414455
> ),
> but I now feel that this needs someone from R-Devel to help understand why
> this is happening.
>
> We are facing a weird situation in our code when using R's [`runif`][1] and
> setting seed with `set.seed` with the `kind = NULL` option (which resolves,
> unless I am mistaken, to `kind = "default"`; the default being
> `"Mersenne-Twister"`).
>
> We set the seed using (8 digit) unique IDs generated by an upstream system,
> before calling `runif`:
>
>     seeds = c(
>       "86548915", "86551615", "86566163", "86577411", "86584144",
>       "86584272", "86620568", "86724613", "86756002", "86768593",
> "86772411",
>       "86781516", "86794389", "86805854", "86814600", "86835092",
> "86874179",
>       "86876466", "86901193", "86987847", "86988080")
>
>     random_values = sapply(seeds, function(x) {
>       set.seed(x)
>       y = runif(1, 17, 26)
>       return(y)
>     })
>
> This gives values that are **extremely** bunched together.
>
>     > summary(random_values)
>        Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>       25.13   25.36   25.66   25.58   25.83   25.94
>
> This behaviour of `runif` goes away when we use `kind =
> "Knuth-TAOCP-2002"`, and we get values that appear to be much more evenly
> spread out.
>
>     random_values = sapply(seeds, function(x) {
>       set.seed(x, kind = "Knuth-TAOCP-2002")
>       y = runif(1, 17, 26)
>       return(y)
>     })
>
> *Output omitted.*
>
> ---
>
> **The most interesting thing here is that this does not happen on Windows
> -- only happens on Ubuntu** (`sessionInfo` output for Ubuntu & Windows
> below).
>
> # Windows output: #
>
>     > seeds = c(
>     +   "86548915", "86551615", "86566163", "86577411", "86584144",
>     +   "86584272", "86620568", "86724613", "86756002", "86768593",
> "86772411",
>     +   "86781516", "86794389", "86805854", "86814600", "86835092",
> "86874179",
>     +   "86876466", "86901193", "86987847", "86988080")
>     >
>     > random_values = sapply(seeds, function(x) {
>     +   set.seed(x)
>     +   y = runif(1, 17, 26)
>     +   return(y)
>     + })
>     >
>     > summary(random_values)
>        Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>       17.32   20.14   23.00   22.17   24.07   25.90
>
> Can someone help understand what is going on?
>
> Ubuntu
> ------
>
>     R version 3.4.0 (2017-04-21)
>     Platform: x86_64-pc-linux-gnu (64-bit)
>     Running under: Ubuntu 16.04.2 LTS
>
>     Matrix products: default
>     BLAS: /usr/lib/libblas/libblas.so.3.6.0
>     LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
>
>     locale:
>     [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
>      [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
>      [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
>      [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
>      [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
>     [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
>
>     attached base packages:
>     [1] parallel  stats     graphics  grDevices utils     datasets
> methods   base
>
>     other attached packages:
>     [1] RMySQL_0.10.8               DBI_0.6-1
>      [3] jsonlite_1.4                tidyjson_0.2.2
>      [5] optiRum_0.37.3              lubridate_1.6.0
>      [7] httr_1.2.1                  gdata_2.18.0
>      [9] XLConnect_0.2-12            XLConnectJars_0.2-12
>     [11] data.table_1.10.4           stringr_1.2.0
>     [13] readxl_1.0.0                xlsx_0.5.7
>     [15] xlsxjars_0.6.1              rJava_0.9-8
>     [17] sqldf_0.4-10                RSQLite_1.1-2
>     [19] gsubfn_0.6-6                proto_1.0.0
>     [21] dplyr_0.5.0                 purrr_0.2.4
>     [23] readr_1.1.1                 tidyr_0.6.3
>     [25] tibble_1.3.0                tidyverse_1.1.1
>     [27] rBayesianOptimization_1.1.0 xgboost_0.6-4
>     [29] MLmetrics_1.1.1             caret_6.0-76
>     [31] ROCR_1.0-7                  gplots_3.0.1
>     [33] effects_3.1-2               pROC_1.10.0
>     [35] pscl_1.4.9                  lattice_0.20-35
>     [37] MASS_7.3-47                 ggplot2_2.2.1
>
>     loaded via a namespace (and not attached):
>     [1] splines_3.4.0      foreach_1.4.3      AUC_0.3.0
> modelr_0.1.0
>      [5] gtools_3.5.0       assertthat_0.2.0   stats4_3.4.0
>  cellranger_1.1.0
>      [9] quantreg_5.33      chron_2.3-50       digest_0.6.10
> rvest_0.3.2
>     [13] minqa_1.2.4        colorspace_1.3-2   Matrix_1.2-10
> plyr_1.8.4
>     [17] psych_1.7.3.21     XML_3.98-1.7       broom_0.4.2
> SparseM_1.77
>     [21] haven_1.0.0        scales_0.4.1       lme4_1.1-13
> MatrixModels_0.4-1
>     [25] mgcv_1.8-17        car_2.1-5          nnet_7.3-12
> lazyeval_0.2.0
>     [29] pbkrtest_0.4-7     mnormt_1.5-5       magrittr_1.5
>  memoise_1.0.0
>     [33] nlme_3.1-131       forcats_0.2.0      xml2_1.1.1
>  foreign_0.8-69
>     [37] tools_3.4.0        hms_0.3            munsell_0.4.3
> compiler_3.4.0
>     [41] caTools_1.17.1     rlang_0.1.1        grid_3.4.0
>  nloptr_1.0.4
>     [45] iterators_1.0.8    bitops_1.0-6       tcltk_3.4.0
> gtable_0.2.0
>     [49] ModelMetrics_1.1.0 codetools_0.2-15   reshape2_1.4.2     R6_2.2.0
>
>     [53] knitr_1.15.1       KernSmooth_2.23-15 stringi_1.1.5
> Rcpp_0.12.11
>
>
>
> Windows
> -------
>
>     > sessionInfo()
>     R version 3.3.2 (2016-10-31)
>     Platform: x86_64-w64-mingw32/x64 (64-bit)
>     Running under: Windows >= 8 x64 (build 9200)
>
>     locale:
>     [1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252
> LC_MONETARY=English_India.1252
>     [4] LC_NUMERIC=C                   LC_TIME=English_India.1252
>
>     attached base packages:
>     [1] graphics  grDevices utils     datasets  grid      stats
>  methods   base
>
>     other attached packages:
>      [1] bindrcpp_0.2         h2o_3.14.0.3         ggrepel_0.6.5
> eulerr_1.1.0         VennDiagram_1.6.17
>      [6] futile.logger_1.4.3  scales_0.4.1         FinCal_0.6.3
>  xml2_1.0.0           httr_1.3.0
>     [11] wesanderson_0.3.2    wordcloud_2.5        RColorBrewer_1.1-2
>  htmltools_0.3.6      urltools_1.6.0
>     [16] timevis_0.4          dtplyr_0.0.1         magrittr_1.5
>  shiny_1.0.5          RODBC_1.3-14
>     [21] zoo_1.8-0            sqldf_0.4-10         RSQLite_1.1-2
> gsubfn_0.6-6         proto_1.0.0
>     [26] gdata_2.17.0         stringr_1.2.0        XLConnect_0.2-12
>  XLConnectJars_0.2-12 data.table_1.10.4
>     [31] xlsx_0.5.7           xlsxjars_0.6.1       rJava_0.9-8
> readxl_0.1.1         googlesheets_0.2.1
>     [36] jsonlite_1.5         tidyjson_0.2.1       RMySQL_0.10.9
> RPostgreSQL_0.4-1    DBI_0.5-1
>     [41] dplyr_0.7.2          purrr_0.2.3          readr_1.1.1
> tidyr_0.7.0          tibble_1.3.3
>     [46] ggplot2_2.2.0        tidyverse_1.0.0      lubridate_1.6.0
>
>     loaded via a namespace (and not attached):
>      [1] gtools_3.5.0         assertthat_0.2.0     triebeard_0.3.0
> cellranger_1.1.0     yaml_2.1.14
>      [6] slam_0.1-40          lattice_0.20-34      glue_1.1.1
>  chron_2.3-48         digest_0.6.12.1
>     [11] colorspace_1.3-1     httpuv_1.3.5         plyr_1.8.4
>  pkgconfig_2.0.1      xtable_1.8-2
>     [16] lazyeval_0.2.0       mime_0.5             memoise_1.0.0
> tools_3.3.2          hms_0.3
>     [21] munsell_0.4.3        lambda.r_1.1.9       rlang_0.1.1
> RCurl_1.95-4.8       labeling_0.3
>     [26] bitops_1.0-6         tcltk_3.3.2          gtable_0.2.0
>  reshape2_1.4.2       R6_2.2.0
>     [31] bindr_0.1            futile.options_1.0.0 stringi_1.1.2
> Rcpp_0.12.12.1
>
>   [1]: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/
> Uniform.html
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list