[R] Parallel computing with the snow package: external file I/O possible?
Waichler, Scott R
Scott.Waichler at pnl.gov
Tue Mar 14 00:50:13 CET 2006
Hello,
I am trying to do model autocalibration using the snow and rgenoud
packages. The function I want to run in task-parallel fashion across
multiple machines is one that pre- and post-processes data and runs an
external model code. My problem is that external file I/O is happening
only in the master node and not in the slaves. I have followed Jasjeet
Sekhon's suggestion to test the cluster setup, and that is fine:
> library(snow)
>
> #pick two machines
> cl <- makeCluster(c("moab","escalante"))
>
> clusterCall(cl, sin, 2)
> The output should be:
> > clusterCall(cl, sin, 2)
> [[1]]
> [1] 0.9092974
>
> [[2]]
> [1] 0.9092974
>
I do indeed get the above result, so I presume the network setup is ok.
Next I tested a function that creates a file. Here is the code that I
sourced from the master ("moab"):
# begin script
library(snow)
setDefaultClusterOptions(outfile="/tmp/cluster1")
setDefaultClusterOptions(master="moab")
cl <- makeCluster(c("moab", "escalante"), type="SOCK")
# Define base pathname for output from my.test()
base.dir <- "./test"
# Define a function that includes some file I/O
my.test <- function(base.dir) {
this.host <- as.character(system("hostname")) # to tag the node that
makes the file
this.rnd <- sample(1:1e6, 1) # to be 'sure' the files have different
names
test.file <- paste(sep="", base.dir, "_", this.host, "_", this.rnd)
file.create(test.file)
} # end my.test()
g <- clusterCall(cl, my.test, base.dir)
print(g)
stopCluster(cl)
# end script
The output (g) was as follows:
[[1]]
[1] TRUE
[[2]]
[1] TRUE
But there was only one file created, which I suspect is by the master
node. A second file was not created by the process on the slave. Also,
system("hostname") returns the number 0 for moab instead of the name.
Any ideas as to what might be wrong?
Thanks,
Scott Waichler
scott.waichler _at_ pnl.gov
More information about the R-help
mailing list