[R] clusterCall with replicate function
Michael Gormley
mpg33 at drexel.edu
Tue Aug 21 21:36:03 CEST 2007
I am trying to run a monte carlo process using snow with a MPI cluster. I
have ~thirty processors to run the algorithm on and I want to run it 5000
times and take the average of the output. A very simple way to do this is
to divide 5000 by the number of processors to get a number n and tell each
processor to run the algorithm n times. I realize there are more efficient
ways to manage the parallelization. To implement this I used the
clusterCall command with the replicate function along the lines of
clusterCall(cl, replicate, n, function(args)). Because my function is a
monte carlo process it relies on drawing from random distributions to
generate output. When I do this, all of my processors generate the same
random numbers. I copied the following from the command space for a simple
example:
cl<-makeCluster(cl, replicate,1,runif(2))
clusterCall(cl, replicate, 2, runif(2))
[[1]]
0.6533959 0.6533959
0.1071051 0.1071051
[[2]]
0.6533959 0.6533959
0.1071051 0.1071051
This is not alleviated by using clusterApply to set a random seed for each
processor and seems to be related to the use of the replicate function
within clusterCall. I have rearranged the function so that replicate is
used to call the clusterCall function (ie. replicate(2, clusterCall(cl,
runif,2),simplify=F) ) and resolved the random number issue. However, this
also involves much more communication between master and slaves and results
in slower computation time. Will rsprng fix this problem? Is there a
better way to do this without using replicate?
I hope this is somewhat clear.
Thanks,
Mike
More information about the R-help
mailing list