[R] Rserve and R to R communication

Mon Apr 9 19:42:44 CEST 2007

On Apr 7, 2007, at 10:56 AM, Ramon Diaz-Uriarte wrote:

> Dear All,
>
> The "clients.txt" file of the latest Rserve package, by Simon  
> Urbanek, says, regarding its R client,
>
> "(...) a simple R client, i.e. it allows you to connect to Rserve  
> from R itself. It is very simple and limited,  because Rserve was  
> not primarily meant for R-to-R communication (there are better ways  
> to do that), but it is useful for quick interactive connection to  
> an Rserve farm."
>
> Which are those better ways to do it? I am thinking about using  
> Rserve to have an R process send jobs to a bunch of Rserves in  
> different machines. It is like what we could do with Rmpi (or pvm),  
> but without the MPI layer. Therefore, presumably it'd be easier to  
> deal with network problems, machine's failures, using checkpoints,  
> etc. (i.e., to try to get better fault tolerance).
>
> It seems that Rserve would provide the basic infrastructure for  
> doing that and saves me from reinventing the wheel of using  
> sockets, etc, directly from R.
>
> However, Simon's comment about better ways of R-to-R communication  
> made me wonder if this idea really makes sense. What is the catch?  
> Have other people tried similar approaches?
>

I was commenting on direct R-to-R communication using sockets +  
'serialize' in R or the 'snow' package for parallel processing. The  
latter could be useful for what you have in mind, because it includes  
a socket-based implementation which allows you to spawn multiple  
children (across multiple machines) and collect their results. It  
uses regular rsh or ssh to start the jobs, so if can use that, it  
should work for you. 'snow' also has PVM and MPI implementations, the  
PVM one is really easy to setup (on unix) and that was what I was  
using for parallel computing in R on a cluster.

Rserve is sort of comparable, but in addition it provides the  
spawning infrastructure due to its client/server concept. What it  
doesn't have is the convenience functions that snow provides like  
clusterApply etc. Thinking of it, it would be actually possible to  
add them, although I admit that the original goal of Rserve was not  
parallel computing :). The idea was to have one Rserve server and  
multiple clients whereas in 'snow' you sort of have one client and  
multiple servers. You could spawn multiple Rserves on multiple  
machines, but Rserve itself doesn't provide any load-balancing out of  
the box, so you'd have to do that yourself.

I don't know if that helps... :)

Cheers,
Simon