[Rd] Pre-compilation and server-side parallel execution
Erik van Zijst
r at erik.prutser.cx
Fri Dec 8 15:51:39 CET 2006
Folks,
My company operates a platform that distributes real-time financial data
from exchanges to users. To extend our services I want to allow users to
write and submit custom R scripts to our platform that operate on our
streaming data to do real-time analysis.
We have thousands of users deploying scripts and each script is
evaluated repeatedly when certain conditions in the stream apply. For
example, a script could compute the NASDAQ100 index value each time one
of its 100 constituents trade.
Scripts are typically small and execute quickly. Each script is
registered once and then repeatedly evaluated with different parameters
(possibly several times per second per script). In this context my
biggest concern is scalability.
The evaluation engine is a pure server-side component without display
abilities. An R-script is invoked with parameters and whatever it
returns is sent to the user.
Ideally I'd need a C api to interact with the interpreter. I've looked
at projects like R/Apache, RServe and RSJava for inspiration and came to
the conclusion that all these projects work by forking multiple
instances of the R-engine where each instance evaluates one script at a
time.
As our service must evaluate many different scripts concurrently
(isolated from one another), I have the following concerns:
1. Spawning a pool of engine instances for massive parallel execution is
expensive, but might work with lots of memory.
2. R's native C-api
[http://cran.r-project.org/doc/manuals/R-exts.html#The-R-API] does not
separate parsing from evaluation. When the same script is evaluated 10
times, it is also parsed 10 times.
I'm mostly concerned about the second issue. Our scripts are registered
once and continuously evaluated. I want to avoid parsing the same script
again each time it is evaluated. Does the engine recognize previously
parsed scripts (like oracle does for SQL queries)?
I interested to hear your thoughts on my concerns and whether you think
R would work in this architecture.
kind regards,
Erik van Zijst
--
And on the seventh day, He exited from append mode.
More information about the R-devel
mailing list