[R] MySql Versus R

Fri Apr 1 13:32:43 CEST 2011

On Fri, Apr 1, 2011 at 6:46 AM, Henri Mone <henriMone at gmail.com> wrote:
> Dear R Users,
>
> I use for my data crunching a combination of MySQL and GNU R. I have
> to handle huge/ middle seized data which is stored in a MySql
> database, R executes a SQL command to fetch the data and does the
> plotting with the build in R plotting functions.
>
> The (low level) calculations like summing, dividing, grouping, sorting
> etc. can be done either with the sql command on the MySQL side or on
> the R side.
> My question is what is faster for this low level calculations / data
> rearrangement MySQL or R? Is there a general rule of thumb what to
> shift to the MySql side and what to the R side?

The sqldf package makes it easy to use sqlite, h2 or postgresql from R
to carry out data manipulation tasks and this has facilitated some
benchmarks by users using sqlite's capability of using an "in memory"
database.   In the cases cited on the sqldf home page sqlite was
faster than R despite the overhead of moving the data into the
database and out again. See http://sqldf.googlecode.com

In general the answer would depend on the database, what has been
cached, the particular query, size of data and how well you had
optimized your sql and R queries.   There are entire books on
optimizing MySQL so this is an extensive subject.  Various comparisons
of different approaches can easily result in different ordering from
fastest to slowest based on what would appear to be relatively minor
aspects of the problem so you would have to benchmark the important
queries to really get an answer that pertains to your situation.
Check out the rbenchmark package for this which makes it relatively
simple to do.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com