[R] R Production Performance

Zitan Broth zitan at mediasculpt.net
Thu Sep 25 23:20:35 CEST 2003


Hi Joe,

Thanks for your message,

> I'm doing something similar using PL/R (an R procedural language handler
> extension to Postgres that I wrote) with Postgres, R, and PHP. In
> Postgres 7.4 (currently at beta3) or with a back-patched copy of 7.3,
> you can preload the R interpreter when the Postgres postmaster first
> starts. This means that essentially R is running as part of the Postgres
> daemon. Whenever a connection is made to the database, the forked
> process already has an initialized copy of R running inside it. The
> startup savings I see are similar to what you did (2.2 seconds versus
> 0.009 seconds):

That sounds cool, it also avoids the file IO.  I'm an R newbie, but could I
achieve something similar with pure PHP.  I was considering trying to use
SWIG for accessing R ( and another package OOQP ).  I am currently working
with MySQL for a demo project, but we have always been considered Postgres
as a more robust database (it actually has stored procedure languages for
example).  I think I may have to give your PL/R a serious look (although I
can't access you site just now).

Are you 'process managing' your calls to R to ensure that it is thread safe
or have you found this unnecessary with the php/postgres/R combo?

Awesome, Z.

> ------------------------------------------------------------------
> Function -- intentionally very simple:
> --------------------------------------
> create or replace function echo(text) returns text as 'print(arg1)'
> language 'plr';
>
> Without preloading (first function call):
> -----------------------------------------
> regression=# explain analyze select echo('hello');
>   Total runtime: 2195.35 msec
>
> Without preloading (second function call):
> -----------------------------------------
> regression=# explain analyze select echo('hello');
>   Total runtime: 0.55 msec
>
> With preloading (first function call):
> -----------------------------------------
> regression=# explain analyze select echo('hello');
>   Total runtime: 9.74 msec
>
> With preloading (second function call):
> -----------------------------------------
> regression=# explain analyze select echo('hello');
>   Total runtime: 0.59 msec
> ------------------------------------------------------------------
>
>
> In both cases the second (and subsequent) function calls are even faster
> because the PL/R function itself has been precompiled and cached.
>
> I call the PL/R function from PHP to read my data directly from the
> database, process it, and generate whatever charts I need. Here's a very
> simple example:
>
>
> The PL/R function:
> ------------------------------------------------------------------
> create type histtup as
> (
>    break float8,
>    count int
> );
>
> create or replace function hist(text, text)
> returns setof histtup as '
>   sql <- paste("select id_val from sample_numeric_data ",
>                "where ia_id=''", arg1, "''", sep="")
>   rs <- pg.spi.exec(sql)
>
>   if (!is.na(arg2)) {
>      x11(display=":5")
>      jpeg(file=arg2, width = 480, height = 480,
>           pointsize = 12, quality = 75)
>      par(ask = FALSE, bg = "#F8F8F8")
>      sql <- paste("select ia_attname as val from atts ",
>                   "where ia_id=''", arg1, "''", sep="")
>      attname <- pg.spi.exec(sql)
>      h <- hist(rs[,1], col = "blue",
>                main = paste("Histogram of", attname$val),
>                xlab = attname$val);
>      dev.off()
>      system(paste("chmod 666 ", arg2, sep=""),
>             intern = FALSE, ignore.stderr = TRUE)
>    }
>    else
>      h <- hist(rs[,1], plot = FALSE);
>
>    result = data.frame(breaks = h$breaks[1:length(h$breaks)-1],
>             count = h$counts);
>
>    return(result)
> ' language 'plr';
> ------------------------------------------------------------------
>
> The PHP page:
> ------------------------------------------------------------------
> <HTML><BODY>
> <?PHP
> echo "
> <FORM ACTION='$PHP_SELF' METHOD='post' NAME='proto_form'>
> <TABLE WIDTH='482' CELLSPACING='0' CELLPADDING='1' BORDER='0'>
>    <TR>
>      <TD>Data</TD>
>      <TD><INPUT TYPE='text' NAME='userdata' value='' size='80'></TD>
>    </TR>
>    <TR>
>      <TD colspan='2'>
>        <INPUT TYPE='submit' NAME='submit' value='Submit'>
>      </TD>
>    </TR>
> </TABLE>
> </FORM>
> ";
>
> if ($_POST['submit'] == "Submit")
> {
>    $tmpfilename = 'charts/hist1.jpg';
>    $conn = pg_connect("dbname=oscon user=postgres");
>    $sql = "select * from hist('" . $_POST['userdata'] . "','" .
>           "/tmp/" . $tmpfilename . "')";
>    $rs = pg_query($conn,$sql);
>    echo "<img src='$tmpfilename' border=0>";
> }
> ?>
> </BODY></HTML>
> ------------------------------------------------------------------
>
>
> Hopefully this gives you some ideas about what is possible. If you're
> interested in PL/R, you can grab a copy (along with a patched 7.3.4
> source RPM for Postgres) here: http://www.joeconway.com/
>
> HTH,
>
> Joe
>
>
>




More information about the R-help mailing list