[R-sig-hpc] parallel library's random numbers

Ross Boylan ross at biostat.ucsf.edu
Thu Dec 19 22:28:41 CET 2013


Conceptually, I have 2,000 tasks and I want each task to have access to
an independent, repeatable random number stream.  Many of the tasks will
run on the same node.  For example, a typical job would be to do tasks
200 to 400.

I might do one run with 10 nodes and a later one with 50, but I want the
same streams for each task.

The facilities in parallel seem designed to achieve repeatable
randomization for a fixed number of nodes.  Is there a way to get them
to do what I want?

The documentation does not explain what nextRNGStream and
nextRNGSubStream do nor how that relates to the initialization done by
clusterSetRNGStream.  For example, if I do nextRNGStream on node 1, do I
get the same stream as is being used on node 2?

The documentation does not say that calling nextRNGStream actually
resets the seed, though that seems to be implicit in the example (and is
explicit in one book I found on the net).  I'm also unsure if calling
mc.reset.stream() is necessary after calling clusterSetRNGStream.

Ross Boylan

P.S. Using R 3.0.1



More information about the R-sig-hpc mailing list