[Rd] Pre-compilation and server-side parallel execution

Erik van Zijst r at erik.prutser.cx
Fri Dec 8 15:51:39 CET 2006


Folks,

My company operates a platform that distributes real-time financial data 
from exchanges to users. To extend our services I want to allow users to 
write and submit custom R scripts to our platform that operate on our 
streaming data to do real-time analysis.

We have thousands of users deploying scripts and each script is 
evaluated repeatedly when certain conditions in the stream apply. For 
example, a script could compute the NASDAQ100 index value each time one 
of its 100 constituents trade.

Scripts are typically small and execute quickly. Each script is 
registered once and then repeatedly evaluated with different parameters 
(possibly several times per second per script). In this context my 
biggest concern is scalability.

The evaluation engine is a pure server-side component without display 
abilities. An R-script is invoked with parameters and whatever it 
returns is sent to the user.

Ideally I'd need a C api to interact with the interpreter. I've looked 
at projects like R/Apache, RServe and RSJava for inspiration and came to 
the conclusion that all these projects work by forking multiple 
instances of the R-engine where each instance evaluates one script at a 
time.

As our service must evaluate many different scripts concurrently 
(isolated from one another), I have the following concerns:

1. Spawning a pool of engine instances for massive parallel execution is 
expensive, but might work with lots of memory.
2. R's native C-api 
[http://cran.r-project.org/doc/manuals/R-exts.html#The-R-API] does not 
separate parsing from evaluation. When the same script is evaluated 10 
times, it is also parsed 10 times.

I'm mostly concerned about the second issue. Our scripts are registered 
once and continuously evaluated. I want to avoid parsing the same script 
again each time it is evaluated. Does the engine recognize previously 
parsed scripts (like oracle does for SQL queries)?

I interested to hear your thoughts on my concerns and whether you think 
R would work in this architecture.

kind regards,
Erik van Zijst
-- 
And on the seventh day, He exited from append mode.



More information about the R-devel mailing list