[R-sig-hpc] creating many separate streams (more streams than nodes)

Ross Boylan ross at biostat.ucsf.edu
Wed Apr 20 00:17:44 CEST 2011


On Tue, 2011-04-19 at 11:21 -0500, Paul Johnson wrote:
> Hi, everybody:
> 
> I sat down to write a quick question, and it is turning into a very
> complicated mess.  I will give you the blunt question. And then I'll
> leave all those other words I had written so that future Googlers
> might learn from my experience. Or  you might understand me better.
> 
> I want to spawn 8000 random generator streams and save their initial
> states. Then I want to be able to re-run simulations against those
> generators and get the same results, whether or not I'm on just one
> machine or on a cluster.
> 
> The treatment of random seeds in packages like SNOW or Rmpi is aimed
> at initializing the cluster nodes, rather than initializing the
> separate streams.  My cluster has 60 nodes, and so the 60 separate
> streams from SNOW will not suffice.  So I'm reading code in rlecuyer
> and rstream to see how this might be done.  My background in
> agent-based simulation gives me some experience with serialization of
> PRNG objects, but R makes it very tough to see through the veil
> because there are hidden global objects like the
I went down this path; the trick is to avoid thinking of initializing
the streams on a per machine basis; do it for each chunk/job/stream.  I
used rsrpng, but others said rlecuyer would be the same.  To be careful,
I freed the stream generator at the end of the job.

So, if you have 8,000 jobs you use the job number as an index, and
initialize its stream by saying this is rank (job #) of a total of 8,000
streams.

You never need to save the stream state, since you know how you created
it.

I had some earlier posts about this, on this list I think, but that's
the basic idea.

Ross Boylan



More information about the R-sig-hpc mailing list