[Rd] issue with sample in R 3.6.0.
Joseph Wood
jwood000 @end|ng |rom gm@||@com
Fri Mar 1 18:06:47 CET 2019
Hello,
I think there is an issue in the sampling rejection algorithm in R 3.6.0.
The do_sample2 function in src/main/unique.c still has 4.5e15 as an
upper limit, implying that numbers greater than INT_MAX are still to
be supported by sample in base R.
Please review the examples below:
set.seed(123)
max(sample(2^31, 1e5))
[1] 2147430096
set.seed(123)
max(sample(2^31 + 1, 1e5))
[1] 1
set.seed(123)
max(sample(2^32, 1e5))
[1] 1
set.seed(123)
max(sample(2^35, 1e5))
[1] 8
set.seed(123)
max(sample(2^38, 1e5))
[1] 64
set.seed(123)
max(sample(2^38, 1e5))
[1] 64
set.seed(123)
max(sample(2^42, 1e5))
[1] 1024
>From the above, we see that if N is greater than 2^31, then N is
bounded by (2^(ceiling(log2(N)) – 32)).
Looking at the source code to src/main/RNG.c, we have the following:
static double rbits(int bits)
{
int_least64_t v = 0;
for (int n = 0; n <= bits; n += 16) {
int v1 = (int) floor(unif_rand() * 65536);
v = 65536 * v + v1;
}
// mask out the bits in the result that are not needed
return (double) (v & ((1L << bits) - 1));
}
The last line has (v & ((1L << bits) - 1)) where v is declared as
int_least64_t. If you notice, we are operating on v with the long
integer literal 1L. I’m pretty sure this is the source of the issue.
By changing 1L to at least a 64 bit integer, it appears that we
correct the problem:
double rbits(int bits)
{
int_least64_t v = 0;
for (int n = 0; n <= bits; n += 16) {
int v1 = (int) floor(unif_rand() * 65536);
v = 65536 * v + v1;
}
int_least64_t one64 = 1L;
// mask out the bits in the result that are not needed
return (double) (v & ((one64 << bits) - 1));
}
Regards,
Joseph Wood
More information about the R-devel
mailing list