[Rd] Bias in R's random integers?
Ralf Stubner
r@lf@@tubner @ending from d@q@n@@com
Thu Sep 27 21:58:34 CEST 2018
On 9/21/18 3:15 PM, Ralf Stubner wrote:
> Right, the RNGs in R produce no more than 32bits, so the conversion to a
> double can be reverted. If we ignore those RNGs that produce less than
> 32bits for the moment, then the attached file contains a sample
> implementation (without long vectors, weighted sampling or hashing). It
> uses Rcpp for convenience, but I have tried to keep the C++ low.
I just realized that the mailing list had stripped the attachment. I am
therefore including it at the end for reference. Meanwhile I have also
cast this into a package (feedback welcome):
https://www.daqana.org/dqsample/
cheerio
ralf
// [[Rcpp::plugins(cpp11)]]
#include <cstdint>
#include <Rcpp.h>
/* Daniel Lemire
Fast Random Integer Generation in an Interval
https://arxiv.org/abs/1805.10941
*/
uint32_t nearlydivisionless32(uint32_t s, uint32_t (*random32)(void)) {
uint32_t x = random32();
uint64_t m = (uint64_t) x * (uint64_t) s;
uint32_t l = (uint32_t) m;
if (l < s) {
uint32_t t = -s % s;
while (l < t) {
x = random32();
m = (uint64_t) x * (uint64_t) s;
l = (uint32_t) m;
}
}
return m >> 32;
}
uint32_t random32() {
return R::runif(0, 1) * 4294967296; /* 2^32 */
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_int(int n, int size, bool replace = false) {
Rcpp::IntegerVector result(Rcpp::no_init(size));
if (replace) {
for (int i = 0; i < size; ++i)
result[i] = nearlydivisionless32(n, random32) + 1;
} else {
Rcpp::IntegerVector tmp(Rcpp::no_init(n));
for (int i = 0; i < n; ++i)
tmp[i] = i;
for (int i = 0; i < size; ++i) {
int j = nearlydivisionless32(n, random32);
result[i] = tmp[j] + 1;
tmp[j] = tmp[--n];
}
}
return result;
}
/*** R
set.seed(42)
sample.int(6, 10, replace = TRUE)
sample.int(100, 10)
set.seed(42)
sample_int(6, 10, replace = TRUE)
sample_int(100, 10)
m <- ceiling((2/5)*2^32)
set.seed(42)
x <- sample.int(m, 1000000, replace = TRUE)
table(x %% 2)
set.seed(42)
y <- sample_int(m, 1000000, replace = TRUE)
table(y %% 2)
set.seed(42)
sample.int(m, 6, replace = TRUE)
set.seed(42)
sample_int(m, 6, replace = TRUE)
bench::mark(orig = sample.int(m, 1000000, replace = TRUE),
new = sample_int(m, 1000000, replace = TRUE),
check = FALSE)
bench::mark(orig = sample.int(6, 1000000, replace = TRUE),
new = sample_int(6, 1000000, replace = TRUE),
check = FALSE)
*/
--
Ralf Stubner
Senior Software Engineer / Trainer
daqana GmbH
Dortustraße 48
14467 Potsdam
T: +49 331 23 61 93 11
F: +49 331 23 61 93 90
M: +49 162 20 91 196
Mail: ralf.stubner using daqana.com
Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr.: DE300072622
Geschäftsführer: Prof. Dr. Dr. Karl-Kuno Kunze
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20180927/2648621a/attachment.sig>
More information about the R-devel
mailing list