[R-sig-DB] Is any database particularly better at "exchanging" large datasets with R?

Bill Northcott w@northcott @end|ng |rom un@w@edu@@u
Wed Feb 13 02:36:28 CET 2008


Some time back, Thomas wrote:
> Is any database particularly better at "exchanging" data with R?
> Background:
> Sometime during the next 12-months, I plan on configuring a new  
> computer system on which I will primarily run "R" and a SQL database  
> (Microsoft SQL Server, MySQL, Oracle, etc). My primary goal is to  
> "optimize" the system for R, and for passing data to and from R and  
> the database.
>  I work with large datasets, and therefore I "think" one of my most  
> important goals should be to maximize the amount of RAM that R can  
> utilize effectively.
Firstly, as has already been suggested nothing beats testing whatever  
set up you have in mind.
Secondly, assuming this data is just for your use rather than a shared  
database which will be updated by many people, almost everything that  
has been mentioned so far is basically unsuitable.  SQL Server, MySQL,  
Postgres, Oracle etc. devote most of their many megabytes code to a  
vast number of features like access control, transaction rollback,  
stored procedures, logging etc. etc., which are almost certainly of no  
use to you, will slow the code down and cause admin headaches.  If you  
really want SQL look at SQLlite.  It is free.  It is just a SQL  
storage system without any of the overhead you don't need (it is a  
2.3MB library on my computer and that is for four architectures!) and  
will scale to TB size datasets.
If all you want is to store and access large amounts of data, then you  
probably don't want SQL at all.  Someone mentioned BLOBs but that  
might be hard work programming.  You might do well to look at HDF5 (http://hdf.ncsa.uiuc.edu/index.html 
).  This is a storage format specifically designed for storing very  
large amounts of scientific/engineering data.  Again it is free and  
open source and has an R interface.
Cheers
Bill Northcott




More information about the R-sig-DB mailing list