[R-sig-hpc] request help with replication and snowFT

Hana Sevcikova hanas at uw.edu
Sun Jul 24 06:06:29 CEST 2011


Paul,
> Could you please explain more about the intended use of
> parallelPerform?
Make parallel computing in R as easy as possible.

> I tried, but could not find a convenient way to send
> functions to the nodes before the main function would run.  I could
> build everything into a package and install that on the nodes, but for
> development and testing, that makes for a pretty tedious process.
You can just pack the additional functions into the main function. In 
your example,

## The main function of interest
myNorm<- function (x){
   myA<- function( x ){
     2 *x
   }
   myB<- function( x ){
     3 * x
   }
   myC<- function( x, y){
     x + y
   }
   whew<-  myA(x)
   whewyou<- myB(whew)
   whewwho<- myC(whew, whewyou)
   y<- rnorm(whewwho)
   list(x, whew, whewyou, whewwho, y, sum(y))
}


The Sys.info call can be defined in an init function:

nodeinfo<- function() Sys.info()[c("nodename","machine")]


Then, you can use performParallel:

res1<- performParallel(cnt, x=myx, fun=myNorm, initfun=nodeinfo, seed=mySeeds)


> Would you care to put an example like this in your
> documentation?
I can do that.

Note that snowFT is not an attempt to replace snow - it is its 
extension.  It simplifies its usage in order to make it accessible to 
people who might otherwise feel too intimidated by parallel computing 
(and offers some additional benefits such as reproducibility and fault 
tolerance). But for people who want to explore and dig into the details, 
the full set of snow functions is available.

> And then explain how a user can grab any one arbitrary stream and
> re-run it for interactive investigation of its properties.  When we
> run this thing 1000 times and 2 are really off the usual result, we
> want to dig in and try to see what happened.
There is no function that allows this. It would be a nice extension 
though - let me know if you have a suggestion for an implementation.

Hana



> Here's my test case:
>
> ### r: number of streams. Should be set as BIGGEST number of runs=streams
> ### you could ever want to replicate.  It sets a framework of streams
> ### that is the same on all nodes.  Here I have 33 streams, only 10 nodes.
> ### snowFT handles the problem of creating 33 separate streams, so there
> ### is one ready for each possible run, no matter which node is doing
> ### the work.
> r<- 33
> ### cnt: number of nodes
> cnt<- 10
>
> cl<- makeClusterFT(cnt, type="MPI")
>
> ### From snowFT methods:
> printClusterInfo(cl)
>
> ### Can use SNOW methods as well.
> ### Testing with SNOW methods: sends function to each system
> clusterCall( cl, function() Sys.info()[c("nodename","machine")])
>
> ### Some user-written functions involved in a simulation
> myA<- function( x ){
>    2 *x
> }
>
> myB<- function( x ){
>    3 * x
> }
>
>
> myC<- function( x, y){
>    x + y
> }
>
> ## The main function of interest
> myNorm<- function (x){
>    whew<-  myA(x)
>    whewyou<- myB(whew)
>    whewwho<- myC(whew, whewyou)
>    y<- rnorm(whewwho)
>    list(x, whew, whewyou, whewwho, y, sum(y))
> }
>
>
> mySeeds<- c(1231, 2323, 43435, 12123, 22442, 634654)
> ##create "x" vector.
> myx<- sample(1:8, r, replace=T)
>
> ## Send functions to systems with SNOW functions
> clusterExport(cl, "myA")
> clusterExport(cl, "myB")
> clusterExport(cl, "myC")
>
> clusterSetupRNG.FT(cl, type = "RNGstream", streamper="replicate", n=r,
> seed=mySeeds)
> res1<- clusterApplyFT(cl, x=myx, fun=myNorm, seed=mySeeds)
>
> print(res1[[1]])
>
>
>
>



More information about the R-sig-hpc mailing list