[R] R and databases - a comment

charles loboz charles_loboz at yahoo.com
Sun Dec 4 05:06:41 CET 2005


1. That was a part of a private email exchange. It has
been suggested that more people may be interested. 

2. I did use various databases (significant part of my
job) for the last 15 years. Some with R for the last 3
years as a hobby. Some comments on the ones used
below. Sorry, no links - I am time-constrained at the
moment - please google if interested in details. The
remarks are from the point of view of R user, not that
of 'general database user'.
 
3. SQLITE. www.sqlite.org - probably the best datase
to use with R. No setup, no administration, embedded -
so less connection overhead. All data in one file - so
easy to transfer. Solid. Very functional SQL, fast if
you play it right (almost as fast as SQLServer on
Windows...) . Some limitations - no stored procedures.
Some preprocessing/parsing can be done using TCL -
well integrated with sqlite if you need that. Due to
the implementation quirk you can even compute
recursive functions (like exponential moving average
or Fibonacci numbers) with SQL :-). Easy import/export
of data to text files. After trying few other dbs I
settled down on this one. Even considered writing a
tutorial on SQLite use with R (like how to process
gigabytes of data on a 128mb computer :-) ) - but time
constraints stopped me. [Personally I think that
SQLite should come bundled with the standard R
installation. Could even be used to keep a lot of R's
internal stuff, would probably simplify overall
coding. But that is for others to decide]
 
All other databases (including mysql) require typical
setup - installation, administration, user rights,
keeping track of ports, services/daemons, directories,
backups etc - so some db administrative skills are
required.I am not sure how many R users are willing to
go through that. The ones who may be interested in the
stuff below
 
4. www.postgres.org Postgres. Free. As complete as one
can wish, small download, great functionality.
Interfaces well to other languages, so you can do
numerics in C++ and store that in the database (though
why not do numerics in R?). Current version 8.1, much
improved. 
 
5. Firebird. open source verion of Interbase. Easy
setup and can have all data in one file. But... slow
development - not many developers there. SQL full but
somewhat quirky (when porting from other dialects). 
 
6. Mysql. the inheritance from the original ISAM
system still shows. Nice user interface, but... if you
need real db why not use postgres? if you need
something simpler, without administration, why not use
SQLITE? No doubt mysql is fine for many simple
websites etc - this is mysql's niche.
 
7. derby and hsqldb. both are written in Java, open
source. HSQLDB (used now by OpenOffice) allows
creation of in-memory tables and it's fast there - but
it's usage from inside R is tricky - there is no
easily available, installable and current ODBC driver.
Similar for derby - the ODBC driver is there, but
installation can be tricky to non-professionals. May
be in the future...
 
There are three 'express' versions of commercial
databases. They all share some restrictions, like max
disc data size 2-4gb, max mem size 1-2gb and usage of
single processor only. Plus various licensing
restrictions, so be careful how you use them. 
 
 - Microsoft - in beta now, over 100mb download
(windows only) (the old version, MSDE, is also
available)
 - Oracle - 150mb download, if i remember correctly
even free to distribute, but check the license
 - DB2 - 500mb download, currently 90 day version, IBM
strong rumour is that early next year the new version
will be free. 
 
Each commercial DB has some OLAP capability, but I am
not sure how much of it is/will be available in the
Express version.


		
__________________________________________ 

Just $16.99/mo. or less. 
dsl.yahoo.com




More information about the R-help mailing list