[R-sig-DB] Is any database particularly better at "exchanging" large datasets with R?
Bill Northcott
w@northcott @end|ng |rom un@w@edu@@u
Wed Feb 13 02:36:28 CET 2008
Some time back, Thomas wrote:
> Is any database particularly better at "exchanging" data with R?
> Background:
> Sometime during the next 12-months, I plan on configuring a new
> computer system on which I will primarily run "R" and a SQL database
> (Microsoft SQL Server, MySQL, Oracle, etc). My primary goal is to
> "optimize" the system for R, and for passing data to and from R and
> the database.
> I work with large datasets, and therefore I "think" one of my most
> important goals should be to maximize the amount of RAM that R can
> utilize effectively.
Firstly, as has already been suggested nothing beats testing whatever
set up you have in mind.
Secondly, assuming this data is just for your use rather than a shared
database which will be updated by many people, almost everything that
has been mentioned so far is basically unsuitable. SQL Server, MySQL,
Postgres, Oracle etc. devote most of their many megabytes code to a
vast number of features like access control, transaction rollback,
stored procedures, logging etc. etc., which are almost certainly of no
use to you, will slow the code down and cause admin headaches. If you
really want SQL look at SQLlite. It is free. It is just a SQL
storage system without any of the overhead you don't need (it is a
2.3MB library on my computer and that is for four architectures!) and
will scale to TB size datasets.
If all you want is to store and access large amounts of data, then you
probably don't want SQL at all. Someone mentioned BLOBs but that
might be hard work programming. You might do well to look at HDF5 (http://hdf.ncsa.uiuc.edu/index.html
). This is a storage format specifically designed for storing very
large amounts of scientific/engineering data. Again it is free and
open source and has an R interface.
Cheers
Bill Northcott
More information about the R-sig-DB
mailing list