[Rd] Bias in R's random integers?

Steve Grubb @grubb @ending from redh@t@com
Fri Sep 21 23:38:05 CEST 2018


On Friday, September 21, 2018 5:28:38 PM EDT Ralf Stubner wrote:
> On 9/21/18 6:38 PM, Tierney, Luke wrote:
> > Not sure what should happen theoretically for the code in vseq.c, but
> > I see the same pattern with the R generators I tried (default,
> > Super-Duper, and L'Ecuyer) and with with bash $RANDOM using
> > 
> > N <- 10000
> > X1 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'", intern =
> > TRUE))) X2 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'",
> > intern = TRUE))) X <- X1 + 2 ^ 15 * (X2 > 2^14)
> > 
> > and with numbers from random.org
> > 
> > library(random)
> > X <- randomNumbers(N, 0, 2^16-1, col = 1)
> > 
> > So I'm not convinced there is an issue.
> 
> There is an issue, but it is in vseq.c.
> 
> The plot I found striking was this:
> 
> http://people.redhat.com/sgrubb/files/r-random.jpg
> 
> It shows a scatter plot that is bounded to some rectangle where the
> upper right and lower left corner are empty. Roughly speaking, X and Y
> correspond to *consecutive differences* between random draws. It is
> obvious that differences between random draws are bounded by the range
> of the RNG, and that there cannot be two *differences in a row* that are
> close to the maximum (or minimum). Hence the expected shape for such a
> scatter plot is a rectangle with two corners being forbidden.
> 
> Within the allowed region, there should be no structure what so ever
> (given enough draws). And that was striking about the above picture: It
> showed clear vertical bands which should not be there. MT does fail some
> statistical tests, but it cannot be brought down that easily.
> 
> Interestingly, I first used Dirk's C++ function for convenience, and
> that did *not* show these bands. But when I compiled vseq.c I could
> reproduce this. To cut this short: There is an error in vseq.c when the
> numbers are read in:
> 
>     tmp = strtoul(buf, NULL, 16);
> 
> The third argument to strtoul is the base in which the numbers should be
> interpreted. However, R has written numbers with base 10. Those can be
> interpreted as base 16, but they will mean something different. Once one
> changes the above line to
> 
>     tmp = strtoul(buf, NULL, 10);
> 
> the bands do disappear.

Yes. I just discovered the problem also. I was looking at how my bash script 
worked fine and how the example Luke gave had a problem. I was using the 
print command to keep things in hex. A corrected copy was uploaded so no one 
else runs across this.

Best Regards,
-Steve



More information about the R-devel mailing list