[R-sig-hpc] parallel library's random numbers

Ross Boylan ross at biostat.ucsf.edu
Thu Dec 19 23:08:43 CET 2013


On Thu, 2013-12-19 at 13:28 -0800, Ross Boylan wrote:
> Conceptually, I have 2,000 tasks and I want each task to have access to
> an independent, repeatable random number stream.  Many of the tasks will
> run on the same node.  For example, a typical job would be to do tasks
> 200 to 400.
> 
> I might do one run with 10 nodes and a later one with 50, but I want the
> same streams for each task.
> 
> The facilities in parallel seem designed to achieve repeatable
> randomization for a fixed number of nodes.  Is there a way to get them
> to do what I want?
I'm thinking maybe I should just ignore the parallel random number stuff
and do set.seed(t) before task t.  It's easy to imagine that is not an
entirely safe method, however.

Ross
> 
> The documentation does not explain what nextRNGStream and
> nextRNGSubStream do nor how that relates to the initialization done by
> clusterSetRNGStream.  For example, if I do nextRNGStream on node 1, do I
> get the same stream as is being used on node 2?
> 
> The documentation does not say that calling nextRNGStream actually
> resets the seed, though that seems to be implicit in the example (and is
> explicit in one book I found on the net).  I'm also unsure if calling
> mc.reset.stream() is necessary after calling clusterSetRNGStream.
> 
> Ross Boylan
> 
> P.S. Using R 3.0.1
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc



More information about the R-sig-hpc mailing list