[R] Data mining, R and MySQL ...

Guazzelli, Alex alex.guazzelli at zementis.com
Fri Jul 17 19:25:08 CEST 2009


The question is: you have your data in MySQL, build your model in R,  
but now want to use the model to score your MySQL data on an on-going  
basis, what to do?

MySQL users frequently use R for data mining and to build statistical  
models. They benefit from the RMySQL package which builds an interface  
between R and MySQL. R (as well as a host of other statistical tools)  
is able to export PMML (Predictive Model Markup Language) which is the  
standard way to represent data mining models (see PMML package in CRAN  
and PMML article in The R Journal).

Mind that building a model is a very different task than deploying one  
or executing it. The model development phase is usually mostly made of  
data analysis and massaging as well as feature selection. During model  
execution all you need are the most important data pieces (a much  
smaller sample of data fields than what you used during model  
development) to generate your decisions. In addition, the required pre- 
processing can be represented in PMML (for more on pre-processing and  
PMML, see http://adapasupport.zementis.com/2009/06/examining-pmml-40-part-i-pre-processing.html) 
.

Model Deployment: Once a model exists, it can be easily uploaded in  
the ADAPA Score Engine which makes models available right away for  
execution via Web Services. ADAPA is available as a service on the  
Amazon Cloud. It is capable of executing models in real-time and it  
costs less than $1/hour.

Model Execution: The task then is to extract data from your MySQL  
database, score it, and write the scored data back into the database.  
You can easily do that by using yet another open source tool:  
Jitterbit. It allows for data to be mapped from MySQL into a Web  
Service Call to ADAPA which returns the data back to Jitterbit and  
MySQL.

Process in Detail - Blog: We have described this process on a step-by- 
step basis here:

http://adapasupport.zementis.com/2009/04/scoring-data-from-your-database-in.html

Process in Detail - Video: We have also made a video describing this  
process. Check it out at

http://www.zementis.com/videos/Jitterbit_Database.htm

Hope you find this information useful!




More information about the R-help mailing list