[Rd] Bias in R's random integers?
Ralf Stubner
r@lf@@tubner @ending from d@q@n@@com
Thu Sep 20 12:59:00 CEST 2018
On 9/20/18 1:43 AM, Carl Boettiger wrote:
> For a well-tested C algorithm, based on my reading of Lemire, the unbiased
> "algorithm 3" in https://arxiv.org/abs/1805.10941 is part already of the C
> standard library in OpenBSD and macOS (as arc4random_uniform), and in the
> GNU standard library. Lemire also provides C++ code in the appendix of his
> piece for both this and the faster "nearly divisionless" algorithm.
>
> It would be excellent if any R core members were interested in considering
> bindings to these algorithms as a patch, or might express expectations for
> how that patch would have to operate (e.g. re Duncan's comment about
> non-integer arguments to sample size). Otherwise, an R package binding
> seems like a good starting point, but I'm not the right volunteer.
It is difficult to do this in a package, since R does not provide access
to the random bits generated by the RNG. Only a float in (0,1) is
available via unif_rand(). However, if one is willing to use an external
RNG, it is of course possible. After reading about Lemire's work [1], I
had planned to integrate such an unbiased sampling scheme into the dqrng
package, which I have now started. [2]
Using Duncan's example, the results look much better:
> library(dqrng)
> m <- (2/5)*2^32
> y <- dqsample(m, 1000000, replace = TRUE)
> table(y %% 2)
0 1
500252 499748
Currently I am taking the other interpretation of "truncated":
> table(dqsample(2.5, 1000000, replace = TRUE))
0 1
499894 500106
I will adjust this to whatever is decided for base R.
However, there is currently neither long vector nor weighted sampling
support. And the performance without replacement is quite bad compared
to R's algorithm with hashing.
cheerio
ralf
[1] via http://www.pcg-random.org/posts/bounded-rands.html
[2] https://github.com/daqana/dqrng/tree/feature/sample
--
Ralf Stubner
Senior Software Engineer / Trainer
daqana GmbH
Dortustraße 48
14467 Potsdam
T: +49 331 23 61 93 11
F: +49 331 23 61 93 90
M: +49 162 20 91 196
Mail: ralf.stubner using daqana.com
Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr.: DE300072622
Geschäftsführer: Prof. Dr. Dr. Karl-Kuno Kunze
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20180920/cce85e63/attachment.sig>
More information about the R-devel
mailing list