[Rd] Bias in R's random integers?
Steve Grubb
@grubb @ending from redh@t@com
Fri Sep 21 23:38:05 CEST 2018
On Friday, September 21, 2018 5:28:38 PM EDT Ralf Stubner wrote:
> On 9/21/18 6:38 PM, Tierney, Luke wrote:
> > Not sure what should happen theoretically for the code in vseq.c, but
> > I see the same pattern with the R generators I tried (default,
> > Super-Duper, and L'Ecuyer) and with with bash $RANDOM using
> >
> > N <- 10000
> > X1 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'", intern =
> > TRUE))) X2 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'",
> > intern = TRUE))) X <- X1 + 2 ^ 15 * (X2 > 2^14)
> >
> > and with numbers from random.org
> >
> > library(random)
> > X <- randomNumbers(N, 0, 2^16-1, col = 1)
> >
> > So I'm not convinced there is an issue.
>
> There is an issue, but it is in vseq.c.
>
> The plot I found striking was this:
>
> http://people.redhat.com/sgrubb/files/r-random.jpg
>
> It shows a scatter plot that is bounded to some rectangle where the
> upper right and lower left corner are empty. Roughly speaking, X and Y
> correspond to *consecutive differences* between random draws. It is
> obvious that differences between random draws are bounded by the range
> of the RNG, and that there cannot be two *differences in a row* that are
> close to the maximum (or minimum). Hence the expected shape for such a
> scatter plot is a rectangle with two corners being forbidden.
>
> Within the allowed region, there should be no structure what so ever
> (given enough draws). And that was striking about the above picture: It
> showed clear vertical bands which should not be there. MT does fail some
> statistical tests, but it cannot be brought down that easily.
>
> Interestingly, I first used Dirk's C++ function for convenience, and
> that did *not* show these bands. But when I compiled vseq.c I could
> reproduce this. To cut this short: There is an error in vseq.c when the
> numbers are read in:
>
> tmp = strtoul(buf, NULL, 16);
>
> The third argument to strtoul is the base in which the numbers should be
> interpreted. However, R has written numbers with base 10. Those can be
> interpreted as base 16, but they will mean something different. Once one
> changes the above line to
>
> tmp = strtoul(buf, NULL, 10);
>
> the bands do disappear.
Yes. I just discovered the problem also. I was looking at how my bash script
worked fine and how the example Luke gave had a problem. I was using the
print command to keep things in hex. A corrected copy was uploaded so no one
else runs across this.
Best Regards,
-Steve
More information about the R-devel
mailing list