[Rd] How to execute R scripts simultaneously from multiple threads

Thu Jan 4 19:27:09 CET 2007

Vladimir Dergachev wrote:
> On Thursday 04 January 2007 4:54 am, Erik van Zijst wrote:
>> Vladimir Dergachev wrote:
>>> On Wednesday 03 January 2007 3:47 am, Erik van Zijst wrote:
>>>> Appearantly the R C-API does not provide a mechanism for parallel
>>>> execution..
>>>>
>>>> It is preferred that the solution is not based on multi-processing (like
>>>> C/S), because that would introduce IPC overhead.
>>> One thing to keep in mind is that IPC is very fast in Linux. So unless
>>> you are making lots of calls to really tiny functions this should not be
>>> an issue.
>> Using pipes or shared memory to pass things around to other processes on
>> the same box is very fast indeed, but if we base our design around
>> something like RServe which uses TCP it could be significantly slower.
>> Our R-based system will be running scripts in response to high-volume
>> real-time stock exchange data, so we expect lots of calls to many tiny
>> functions indeed.
> 
> Very interesting :) 
> 
> If you are running RServe on the other box you will need to send data over 
> ethernet anyway (and will probably use TCP). If it is on the same box and you 
> use "localhost" the packets will go over loopback - which would be 
> significantly faster.

I haven't looked at RServe in awhile, but I think that it fires up an R 
interpreter in response to a client request and then sticks around for 
the same client to serve it additional requests. The question is how 
does it manage all the R interpreters with varying demand.

This issue is solved when you embed R into Apache (using the prefork 
MPM), as the pool of apache child processes (each with their own R 
interpreter) expands and contracts on demand. Using this with the 
loopback device would be a nice solution:

http://biostat.mc.vanderbilt.edu/RApacheProject

Jeff
-- 
http://biostat.mc.vanderbilt.edu/JeffreyHorner