[Rd] portable parallel seeds project: request for critiques

Petr Savicky savicky at cs.cas.cz
Fri Mar 2 12:24:41 CET 2012


On Fri, Mar 02, 2012 at 10:36:14AM +0100, Karl Forner wrote:
[...]
> Hello,
> I would be also in favor for using multiple seeds based on (seed,
> task_number) for convenience (i.e. avoiding storing the seeds)
> and with the possibility of having a dynamic number of tasks, but I am mot
> sure it is theoretically correct.
> But I can refer you to this article:
> http://www.agner.org/random/ran-instructions.pdf , section 6.1
> where the author states:
> 
> For example, if we make 100 streams of 10^10 random numbers each from an
> > SFMT
> > generator with cycle length ρ = 2^11213, we have a probability of overlap
> > p ≈ 10^3362.
> >
> 
> What do you think ? I am very concerned by the correctness of this approach
> so would appreciate any advice on that matter.

Hi.

If correctness is crucial, then the literature on cryptography provides
the best answers. At the current state of knowledge, it is not possible
to prove that a generator is undistinguishable from truly random numbers
in a mathematical sense. The problem is that if we can prove this for
any generator, then the proof also implies P \not= NP. This is an open
problem, so proving that any generator is correct in the sense of
undistinguishability is also open.

The best, what we can have, is a generator, which cannot be distinguished
from truly random numbers using the known methods, although experts on
cryptography tried to find such a method intensively. An example of such a
generator is Fortuna generator using AES, which is available at CRAN as "randaes".

  http://en.wikipedia.org/wiki/Fortuna_PRNG

The same can be said about initializations. Initialization is a random
number generator, whose output is used as the initial state of some
other generator. There is no proof that a particular initialization cannot
be distinguished from truly random numbers in a mathematical sense for
the same reason as above.

A possible strategy is to use a cryptographically strong hash function
for the initialization. This means to transform the seed to the initial
state of the generator using a function, for which we have a good
guarantee that it produces output, which is computationally hard to
distinguish from truly random numbers. For this purpose, i suggest
to use the package rngSetSeed provided currently at

  http://www.cs.cas.cz/~savicky/randomNumbers/

It is based on AES and Fortuna similarly as "randaes", but these
components are used only for the initialization of Mersenne-Twister.
When the generator is initialized, then it runs on its usual speed.

In the notation of

  http://www.agner.org/random/ran-instructions.pdf

using rngSetSeed for initialization of Mersenne-Twister is Method 4
in Section 6.1.

I appreciate comments.

Petr Savicky.

P.S. I included some more comments on the relationship of provably good
random number generators and P ?= NP question to the end of the page

  http://www.cs.cas.cz/~savicky/randomNumbers/



More information about the R-devel mailing list