[R] R Production Performance
Joe Conway
mail at joeconway.com
Wed Sep 24 06:47:42 CEST 2003
Paul Meagher wrote:
> Below is the test I ran awhile back on invoking R as a system call. It
> might be faster if you had a c-extension to R but before I went that route I
> would want to know 1) roughly how fast Python and Perl are in returning
> results with their c-bindings/embedded stuff/dcom stuff, 2) whether R can be
> run as a daemon process so you don't incur start up costs, and 3) whether R
> can act as a math server in the sense that it will fork children or threads
> as multiple users establish sessions with it. I agree it would be nice to
> have a better interface to R than via a system call.
>
I'm doing something similar using PL/R (an R procedural language handler
extension to Postgres that I wrote) with Postgres, R, and PHP. In
Postgres 7.4 (currently at beta3) or with a back-patched copy of 7.3,
you can preload the R interpreter when the Postgres postmaster first
starts. This means that essentially R is running as part of the Postgres
daemon. Whenever a connection is made to the database, the forked
process already has an initialized copy of R running inside it. The
startup savings I see are similar to what you did (2.2 seconds versus
0.009 seconds):
------------------------------------------------------------------
Function -- intentionally very simple:
--------------------------------------
create or replace function echo(text) returns text as 'print(arg1)'
language 'plr';
Without preloading (first function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
Total runtime: 2195.35 msec
Without preloading (second function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
Total runtime: 0.55 msec
With preloading (first function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
Total runtime: 9.74 msec
With preloading (second function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
Total runtime: 0.59 msec
------------------------------------------------------------------
In both cases the second (and subsequent) function calls are even faster
because the PL/R function itself has been precompiled and cached.
I call the PL/R function from PHP to read my data directly from the
database, process it, and generate whatever charts I need. Here's a very
simple example:
The PL/R function:
------------------------------------------------------------------
create type histtup as
(
break float8,
count int
);
create or replace function hist(text, text)
returns setof histtup as '
sql <- paste("select id_val from sample_numeric_data ",
"where ia_id=''", arg1, "''", sep="")
rs <- pg.spi.exec(sql)
if (!is.na(arg2)) {
x11(display=":5")
jpeg(file=arg2, width = 480, height = 480,
pointsize = 12, quality = 75)
par(ask = FALSE, bg = "#F8F8F8")
sql <- paste("select ia_attname as val from atts ",
"where ia_id=''", arg1, "''", sep="")
attname <- pg.spi.exec(sql)
h <- hist(rs[,1], col = "blue",
main = paste("Histogram of", attname$val),
xlab = attname$val);
dev.off()
system(paste("chmod 666 ", arg2, sep=""),
intern = FALSE, ignore.stderr = TRUE)
}
else
h <- hist(rs[,1], plot = FALSE);
result = data.frame(breaks = h$breaks[1:length(h$breaks)-1],
count = h$counts);
return(result)
' language 'plr';
------------------------------------------------------------------
The PHP page:
------------------------------------------------------------------
<HTML><BODY>
<?PHP
echo "
<FORM ACTION='$PHP_SELF' METHOD='post' NAME='proto_form'>
<TABLE WIDTH='482' CELLSPACING='0' CELLPADDING='1' BORDER='0'>
<TR>
<TD>Data</TD>
<TD><INPUT TYPE='text' NAME='userdata' value='' size='80'></TD>
</TR>
<TR>
<TD colspan='2'>
<INPUT TYPE='submit' NAME='submit' value='Submit'>
</TD>
</TR>
</TABLE>
</FORM>
";
if ($_POST['submit'] == "Submit")
{
$tmpfilename = 'charts/hist1.jpg';
$conn = pg_connect("dbname=oscon user=postgres");
$sql = "select * from hist('" . $_POST['userdata'] . "','" .
"/tmp/" . $tmpfilename . "')";
$rs = pg_query($conn,$sql);
echo "<img src='$tmpfilename' border=0>";
}
?>
</BODY></HTML>
------------------------------------------------------------------
Hopefully this gives you some ideas about what is possible. If you're
interested in PL/R, you can grab a copy (along with a patched 7.3.4
source RPM for Postgres) here: http://www.joeconway.com/
HTH,
Joe
More information about the R-help
mailing list