[Rd] How to execute R scripts simultaneously from multiple threads
Jeffrey Horner
jeff.horner at vanderbilt.edu
Thu Jan 4 19:27:09 CET 2007
Vladimir Dergachev wrote:
> On Thursday 04 January 2007 4:54 am, Erik van Zijst wrote:
>> Vladimir Dergachev wrote:
>>> On Wednesday 03 January 2007 3:47 am, Erik van Zijst wrote:
>>>> Appearantly the R C-API does not provide a mechanism for parallel
>>>> execution..
>>>>
>>>> It is preferred that the solution is not based on multi-processing (like
>>>> C/S), because that would introduce IPC overhead.
>>> One thing to keep in mind is that IPC is very fast in Linux. So unless
>>> you are making lots of calls to really tiny functions this should not be
>>> an issue.
>> Using pipes or shared memory to pass things around to other processes on
>> the same box is very fast indeed, but if we base our design around
>> something like RServe which uses TCP it could be significantly slower.
>> Our R-based system will be running scripts in response to high-volume
>> real-time stock exchange data, so we expect lots of calls to many tiny
>> functions indeed.
>
> Very interesting :)
>
> If you are running RServe on the other box you will need to send data over
> ethernet anyway (and will probably use TCP). If it is on the same box and you
> use "localhost" the packets will go over loopback - which would be
> significantly faster.
I haven't looked at RServe in awhile, but I think that it fires up an R
interpreter in response to a client request and then sticks around for
the same client to serve it additional requests. The question is how
does it manage all the R interpreters with varying demand.
This issue is solved when you embed R into Apache (using the prefork
MPM), as the pool of apache child processes (each with their own R
interpreter) expands and contracts on demand. Using this with the
loopback device would be a nice solution:
http://biostat.mc.vanderbilt.edu/RApacheProject
Jeff
--
http://biostat.mc.vanderbilt.edu/JeffreyHorner
More information about the R-devel
mailing list