[R] Rserve and R to R communication
Simon Urbanek
Simon.Urbanek at r-project.org
Mon Apr 9 19:42:44 CEST 2007
On Apr 7, 2007, at 10:56 AM, Ramon Diaz-Uriarte wrote:
> Dear All,
>
> The "clients.txt" file of the latest Rserve package, by Simon
> Urbanek, says, regarding its R client,
>
> "(...) a simple R client, i.e. it allows you to connect to Rserve
> from R itself. It is very simple and limited, because Rserve was
> not primarily meant for R-to-R communication (there are better ways
> to do that), but it is useful for quick interactive connection to
> an Rserve farm."
>
> Which are those better ways to do it? I am thinking about using
> Rserve to have an R process send jobs to a bunch of Rserves in
> different machines. It is like what we could do with Rmpi (or pvm),
> but without the MPI layer. Therefore, presumably it'd be easier to
> deal with network problems, machine's failures, using checkpoints,
> etc. (i.e., to try to get better fault tolerance).
>
> It seems that Rserve would provide the basic infrastructure for
> doing that and saves me from reinventing the wheel of using
> sockets, etc, directly from R.
>
> However, Simon's comment about better ways of R-to-R communication
> made me wonder if this idea really makes sense. What is the catch?
> Have other people tried similar approaches?
>
I was commenting on direct R-to-R communication using sockets +
'serialize' in R or the 'snow' package for parallel processing. The
latter could be useful for what you have in mind, because it includes
a socket-based implementation which allows you to spawn multiple
children (across multiple machines) and collect their results. It
uses regular rsh or ssh to start the jobs, so if can use that, it
should work for you. 'snow' also has PVM and MPI implementations, the
PVM one is really easy to setup (on unix) and that was what I was
using for parallel computing in R on a cluster.
Rserve is sort of comparable, but in addition it provides the
spawning infrastructure due to its client/server concept. What it
doesn't have is the convenience functions that snow provides like
clusterApply etc. Thinking of it, it would be actually possible to
add them, although I admit that the original goal of Rserve was not
parallel computing :). The idea was to have one Rserve server and
multiple clients whereas in 'snow' you sort of have one client and
multiple servers. You could spawn multiple Rserves on multiple
machines, but Rserve itself doesn't provide any load-balancing out of
the box, so you'd have to do that yourself.
I don't know if that helps... :)
Cheers,
Simon
More information about the R-help
mailing list