[R] Data mining, R and MySQL ...
Guazzelli, Alex
alex.guazzelli at zementis.com
Fri Jul 17 19:25:08 CEST 2009
The question is: you have your data in MySQL, build your model in R,
but now want to use the model to score your MySQL data on an on-going
basis, what to do?
MySQL users frequently use R for data mining and to build statistical
models. They benefit from the RMySQL package which builds an interface
between R and MySQL. R (as well as a host of other statistical tools)
is able to export PMML (Predictive Model Markup Language) which is the
standard way to represent data mining models (see PMML package in CRAN
and PMML article in The R Journal).
Mind that building a model is a very different task than deploying one
or executing it. The model development phase is usually mostly made of
data analysis and massaging as well as feature selection. During model
execution all you need are the most important data pieces (a much
smaller sample of data fields than what you used during model
development) to generate your decisions. In addition, the required pre-
processing can be represented in PMML (for more on pre-processing and
PMML, see http://adapasupport.zementis.com/2009/06/examining-pmml-40-part-i-pre-processing.html)
.
Model Deployment: Once a model exists, it can be easily uploaded in
the ADAPA Score Engine which makes models available right away for
execution via Web Services. ADAPA is available as a service on the
Amazon Cloud. It is capable of executing models in real-time and it
costs less than $1/hour.
Model Execution: The task then is to extract data from your MySQL
database, score it, and write the scored data back into the database.
You can easily do that by using yet another open source tool:
Jitterbit. It allows for data to be mapped from MySQL into a Web
Service Call to ADAPA which returns the data back to Jitterbit and
MySQL.
Process in Detail - Blog: We have described this process on a step-by-
step basis here:
http://adapasupport.zementis.com/2009/04/scoring-data-from-your-database-in.html
Process in Detail - Video: We have also made a video describing this
process. Check it out at
http://www.zementis.com/videos/Jitterbit_Database.htm
Hope you find this information useful!
More information about the R-help
mailing list