[R] PHP MySQL and R

Samuel A Mati samsonite at nyu.edu
Thu Dec 16 19:09:16 CET 2004


Hello everyone, this is my first post.  Nice to meet all of you.  I am 
having some troubles using R in combination with PHP and MySQL.  I 
would appreciate any assistance very much!  This is kind of long, so if 
you'd like a shorter version let me know.

I am working on a project that takes a list of points (inputted via the 
web, and stored into a MySQL DB), and runs an R script on them.  The 
user receives and e-mail when the script is complete, and can view the 
output (which, ideally, will be stored in a database and formatted for 
the web by PHP)

What is the best way to do this?  

As of now, I have it so the user can upload the list of points and they 
are stored in a database. (see below)  When the user requests a job to 
be run, that is, for the server to use R to process the data, an entry 
is added to the jobs table.

Every x seconds, a daemon (written in PHP) looks at the "jobs" table 
and looks to see if any are in the "processing" state.  If none are, 
this means the server is free to run a script... so the daemon chooses 
a job to be run. (and sets its status to processing)

At this point this information is available:
The R script that needs to be run.
The Dataset ID

-----------
Problem #1:
How should I call R so that it runs the script, lets call it "bla.R" on 
the points stored in a MySQL database?  Do I have PHP create a 
temporary file, call the R script with that filename as an argument, 
and have R just do table.read("temp.txt")?
-----------

-----------
Problem #2:
The R script just runs linear regressions on the data.  I'd like to 
take only SOME of the data outputted by the "summary" function.  Let's 
say we have a simple linear regression on the X and Y points:
fit <- lm(X~Y)

How can I get R to output something that can be easily split apart and 
stored into a DB?  I want the following values:

-Residuals Min
-Residuals 1Q
-Residuals Median
-Residuals 3Q
-Residuals Max
-The Residual Standard Error
-Multiple R-Squared
-Adjusted R-Squared
-F-Statistic
-etc..etc..

How can this be acheived?
-------------



====Database Structure====

A list of points is called a Dataset.

We have a table called "Datasets" which simply holds all the Datasets:

DATASETS
id
title

and a table "Data" which holds all the points of all the datasets:

DATA
id
ds_id
x
y
lagged

The points of a Dataset can be found from this query: "SELECT 
x,y,lagged FROM DATA WHERE ds_id=(whatever dataset)"


and the table "jobs"

JOBS
id
script (which r script to run)
dataset (which dataset to use)
status (queued, processing, or completed)

=======================


Thank you all so much for helping me out.. I appreciate it very much 
and am looking forward to figuring this out!
-Sam




More information about the R-help mailing list