[R] testing randomness of random number generators with student t-test?

Petr Savicky savicky at praha1.ff.cuni.cz
Thu Feb 3 13:29:06 CET 2011


On Wed, Feb 02, 2011 at 06:01:36PM -0500, Carl Witthoft wrote:
> Hi, subject more or less says it all.
> 
> I freely admit to not having bothered to find some of the online papers 
> about method of testing the quality of random number generators -- but 
> in an idle moment I wondered what to expect from something like the 
> following:
> 
> 
> randa<-runif(1000)
> randb<-runif(1000)
> t.test(randa,randb)$p.value
> var.test(randa,randb)$p.value
> 
> [repeat ad nauseum]
> 
> 
> Is the range of p-values I get in any way related tothe "quality" of the 
> random number generator?

Hi.

As already explained, the result of t.test() in this case confirms
good quality of Mersenne Twister generator used in R.

The situation is slightly more complicated with ks.test() due to
the 32-bit precision of the random numbers as discussed in
section Note of ?RNGkind. For example

  n <- 100000
  ks.test(runif(n), runif(n))

typically produces a warning due to ties. This is not related to the
quality of the randomness. The reason is that the random numbers
have 32 bits and due to birthday paradox we get collisions already 
for 2^16 numbers with probability about 0.39. The null hypothesis
should be changed to assume uniform distribution on the numbers in
(0, 1) with at most 32 bits.

See section Random Number Generators of CRAN Task View Probability
Distributions by Christophe Dutang for information on CRAN packages
related to random numbers.

As far as i know, the only tests, which can distinguish Mersenne Twister 
numbers from truly random ones are linear complexity tests mod 2. This
is discussed, for example, in section 7 Conclusion, Future Work, and
Open Issues in
  http://www.iro.umontreal.ca/~lecuyer/myftp/papers/horms.pdf
by P. L'Ecuyer.

Applications, which do not use the bitwise mod 2 (XOR) operations, are
very unlikely to interfere with the linear tests mod 2. On the other hand,
if bitwise XOR is used, then Mersenne Twister numbers may be predicted
due to the fact that it is defined using XOR operation and the history of
the last 624 numbers. A simple demonstration of this known predictability
is contained in
  http://www.cs.cas.cz/~savicky/predict_MT/predict_MT.R

At the first glance, this may look as very bad. On the other hand, if there
is a relatively simple smooth function of 625 real variables, which has
a measurable difference of expected value on Mersenne Twister numbers and
truly random ones, then this is likely to be an interesting mathematical
discovery.

Petr Savicky.



More information about the R-help mailing list