[R] R Production Performance
Zitan Broth
zitan at mediasculpt.net
Thu Sep 25 23:20:35 CEST 2003
Hi Joe,
Thanks for your message,
> I'm doing something similar using PL/R (an R procedural language handler
> extension to Postgres that I wrote) with Postgres, R, and PHP. In
> Postgres 7.4 (currently at beta3) or with a back-patched copy of 7.3,
> you can preload the R interpreter when the Postgres postmaster first
> starts. This means that essentially R is running as part of the Postgres
> daemon. Whenever a connection is made to the database, the forked
> process already has an initialized copy of R running inside it. The
> startup savings I see are similar to what you did (2.2 seconds versus
> 0.009 seconds):
That sounds cool, it also avoids the file IO. I'm an R newbie, but could I
achieve something similar with pure PHP. I was considering trying to use
SWIG for accessing R ( and another package OOQP ). I am currently working
with MySQL for a demo project, but we have always been considered Postgres
as a more robust database (it actually has stored procedure languages for
example). I think I may have to give your PL/R a serious look (although I
can't access you site just now).
Are you 'process managing' your calls to R to ensure that it is thread safe
or have you found this unnecessary with the php/postgres/R combo?
Awesome, Z.
> ------------------------------------------------------------------
> Function -- intentionally very simple:
> --------------------------------------
> create or replace function echo(text) returns text as 'print(arg1)'
> language 'plr';
>
> Without preloading (first function call):
> -----------------------------------------
> regression=# explain analyze select echo('hello');
> Total runtime: 2195.35 msec
>
> Without preloading (second function call):
> -----------------------------------------
> regression=# explain analyze select echo('hello');
> Total runtime: 0.55 msec
>
> With preloading (first function call):
> -----------------------------------------
> regression=# explain analyze select echo('hello');
> Total runtime: 9.74 msec
>
> With preloading (second function call):
> -----------------------------------------
> regression=# explain analyze select echo('hello');
> Total runtime: 0.59 msec
> ------------------------------------------------------------------
>
>
> In both cases the second (and subsequent) function calls are even faster
> because the PL/R function itself has been precompiled and cached.
>
> I call the PL/R function from PHP to read my data directly from the
> database, process it, and generate whatever charts I need. Here's a very
> simple example:
>
>
> The PL/R function:
> ------------------------------------------------------------------
> create type histtup as
> (
> break float8,
> count int
> );
>
> create or replace function hist(text, text)
> returns setof histtup as '
> sql <- paste("select id_val from sample_numeric_data ",
> "where ia_id=''", arg1, "''", sep="")
> rs <- pg.spi.exec(sql)
>
> if (!is.na(arg2)) {
> x11(display=":5")
> jpeg(file=arg2, width = 480, height = 480,
> pointsize = 12, quality = 75)
> par(ask = FALSE, bg = "#F8F8F8")
> sql <- paste("select ia_attname as val from atts ",
> "where ia_id=''", arg1, "''", sep="")
> attname <- pg.spi.exec(sql)
> h <- hist(rs[,1], col = "blue",
> main = paste("Histogram of", attname$val),
> xlab = attname$val);
> dev.off()
> system(paste("chmod 666 ", arg2, sep=""),
> intern = FALSE, ignore.stderr = TRUE)
> }
> else
> h <- hist(rs[,1], plot = FALSE);
>
> result = data.frame(breaks = h$breaks[1:length(h$breaks)-1],
> count = h$counts);
>
> return(result)
> ' language 'plr';
> ------------------------------------------------------------------
>
> The PHP page:
> ------------------------------------------------------------------
> <HTML><BODY>
> <?PHP
> echo "
> <FORM ACTION='$PHP_SELF' METHOD='post' NAME='proto_form'>
> <TABLE WIDTH='482' CELLSPACING='0' CELLPADDING='1' BORDER='0'>
> <TR>
> <TD>Data</TD>
> <TD><INPUT TYPE='text' NAME='userdata' value='' size='80'></TD>
> </TR>
> <TR>
> <TD colspan='2'>
> <INPUT TYPE='submit' NAME='submit' value='Submit'>
> </TD>
> </TR>
> </TABLE>
> </FORM>
> ";
>
> if ($_POST['submit'] == "Submit")
> {
> $tmpfilename = 'charts/hist1.jpg';
> $conn = pg_connect("dbname=oscon user=postgres");
> $sql = "select * from hist('" . $_POST['userdata'] . "','" .
> "/tmp/" . $tmpfilename . "')";
> $rs = pg_query($conn,$sql);
> echo "<img src='$tmpfilename' border=0>";
> }
> ?>
> </BODY></HTML>
> ------------------------------------------------------------------
>
>
> Hopefully this gives you some ideas about what is possible. If you're
> interested in PL/R, you can grab a copy (along with a patched 7.3.4
> source RPM for Postgres) here: http://www.joeconway.com/
>
> HTH,
>
> Joe
>
>
>
More information about the R-help
mailing list