[R-sig-DB] Storing R objects (was [R] advice requested re: building "good" system (R, SQL db) for handling large datasets)

Jeffrey Horner jeff.horner at vanderbilt.edu
Thu Feb 7 18:39:13 CET 2008


Richard Pearson wrote on 02/07/2008 06:16 AM:
> (moved to R-sig-db from R-help)
> 
> Jeff,
> 
> I have a project where I want to create large numbers of large, complex 
> objects (e.g. bioconductor ExpressionSet objects). I want to store these 
> along with metadata (such as what raw data and parameters were used to 
> create the object). I will later want to access subsets of these 
> objects, with the subset specified by a query. It seems to me the 
> natural way to do this would be to store the metadata and the objects 
> themselves in database tables, and I have assumed that the objects would 
> need to be serialised and stored as BLOBs. It sounds like at present 
> there are no plans for infrastructure that would allow me to do this, 
> but I would be interested to know if anyone plans to make such a 
> scenario possible in the future.
> 
> I am assuming in the above that it is not possible to store arbitrarily 
> complex R objects in a DB, without a lot of work coercing all the 
> various slots in the object to data.frames, and saving the data.frames 
> to different tables. I've had a quick scan through the documentation for 
> DBI, RODBC, RMySQL and ROracle, but couldn't see any such functionality.
> 
> An alternative for my situation would be to store the R objects as files 
> (using save) and store the metadata and filenames in a DB, but this 
> seems to me to add an extra layer of complexity/maintenance. Finally, I 
> could of course save everything as files, but one of the reasons for 
> storing things in a DB is because I would like to create dynamic web 
> pages linked to metadata and results data in the DB.

Richard, I humbly suggest you actually benchmark how long it takes to 
retrieve a 2GB object from the filesystem into R. Then, add the time it 
takes to subset the object and print it on the console. Now, add the 
overhead of constructing the web page of that subset. Will the users of 
your web application wait that long for their results? Now swap out the 
filesystem and place the objects in the DB; that's obviously be slower, 
right?

Consider splitting your objects into a coherent db schema and only pull 
into R, or a web page, the parts that you want to analyze and display.

Jeff

> 
> Best wishes
> 
> Richard.
> 
> 
> Jeffrey Horner wrote:
>> Richard Pearson wrote on 02/06/2008 06:25 AM:
>>> Hi Thomas
>> [...]
>>> With databases, one issue that might be relevant is whether you want 
>>> to store data in tables (e.g. one table to store one data.frame) that 
>>> can subsequently be manipulated in the DB, or to store R objects as R 
>>> objects (e.g. as BLOBs). My situation is likely to be the later case, 
>>> and one of my concerns is that many DBs have an upper limit of 2GB on 
>>> BLOBs, and I might potentially have objects that are larger than this.
>> [...]
>>
>> I'd be curious as to why you'd want to store and retrieve R objects 
>> from a BLOB column in a table. I've often thought about this, but 
>> unfortunately neither the DBI package nor the RODBC package support this.
>>
>> Jeff




More information about the R-sig-DB mailing list