[R] Passing data among multiple instances
Warren Young
warren at etr-usa.com
Wed Feb 4 16:02:46 CET 2009
Feng Li wrote:
>
> I have two R instances running at the same time,
On the same computer, or on different computers?
Is the number of Rs likely to change, or will it always be just the two?
Is this a simple one-off problem, or are you breaking the problem up
into pieces so you can throw lots of hardware at it?
> Is there a simpler way to pass the data in A to B?
Perhaps the simplest option is to write the data structure to a file,
using any of the several R ways to do that. When instance 2 sees that a
file is available, it slurps its contents in and works on it. The hard
part is making the second instance wait until the whole file is written
out by the first. You wouldn't want it to read in half the file then
hit the end because the first process hasn't finished writing out the
file. I don't see any good mechanism in R to fix this.
A more robust option is to use sockets. This is suitable even within a
single machine. See ?make.socket. This solves the "how do I know when
I've got the full data structure problem" because the second process can
just keep reading until it gets an error indicating that the remote peer
closed the connection. Once you have the data structure in string form,
you can eval() it to get an R object suitable for munching on. Figuring
out how to pass the data might be the hardest part. deparse() might be
the easiest way.
If you're hoping to scale this up to lots of processes, look into Rmpi.
This provides a very clean way for an R program on one computer to
start slaves on other computers and then pass data to them in native R
structures. Setting up MPI itself is not trivial, however. It's best
when you already have a cluster of computers linked with MPI.
More information about the R-help
mailing list