[R-sig-DB] Storing R objects (was [R] advice requested re: building "good" system (R, SQL db) for handling large datasets)

Richard Pearson r|ch@rd@pe@r@on @end|ng |rom po@tgr@d@m@nche@ter@@c@uk
Fri Feb 8 12:51:47 CET 2008


I have perhaps confused the issue by mentioning the web application. The 
web application will only be based on small tables of results and 
metadata - I will not need any access to the large objects from the web 
application. I will however need access to the large objects from R, so 
I am thinking about how I should organise the storage of these objects. 
I think I will use Sean's fine suggestion (worth far more than $0.02!), 
but will store my large objects as files, rather than in the DB.

Many thanks to Jeff, Sean and Dirk for the great replies - much appreciated!

Richard.


Jeffrey Horner wrote:
> Richard Pearson wrote on 02/07/2008 06:16 AM:
>> (moved to R-sig-db from R-help)
>>
>> Jeff,
>>
>> I have a project where I want to create large numbers of large, 
>> complex objects (e.g. bioconductor ExpressionSet objects). I want to 
>> store these along with metadata (such as what raw data and parameters 
>> were used to create the object). I will later want to access subsets 
>> of these objects, with the subset specified by a query. It seems to 
>> me the natural way to do this would be to store the metadata and the 
>> objects themselves in database tables, and I have assumed that the 
>> objects would need to be serialised and stored as BLOBs. It sounds 
>> like at present there are no plans for infrastructure that would 
>> allow me to do this, but I would be interested to know if anyone 
>> plans to make such a scenario possible in the future.
>>
>> I am assuming in the above that it is not possible to store 
>> arbitrarily complex R objects in a DB, without a lot of work coercing 
>> all the various slots in the object to data.frames, and saving the 
>> data.frames to different tables. I've had a quick scan through the 
>> documentation for DBI, RODBC, RMySQL and ROracle, but couldn't see 
>> any such functionality.
>>
>> An alternative for my situation would be to store the R objects as 
>> files (using save) and store the metadata and filenames in a DB, but 
>> this seems to me to add an extra layer of complexity/maintenance. 
>> Finally, I could of course save everything as files, but one of the 
>> reasons for storing things in a DB is because I would like to create 
>> dynamic web pages linked to metadata and results data in the DB.
>
> Richard, I humbly suggest you actually benchmark how long it takes to 
> retrieve a 2GB object from the filesystem into R. Then, add the time 
> it takes to subset the object and print it on the console. Now, add 
> the overhead of constructing the web page of that subset. Will the 
> users of your web application wait that long for their results? Now 
> swap out the filesystem and place the objects in the DB; that's 
> obviously be slower, right?
>
> Consider splitting your objects into a coherent db schema and only 
> pull into R, or a web page, the parts that you want to analyze and 
> display.
>
> Jeff
>
>>
>> Best wishes
>>
>> Richard.
>>
>>
>> Jeffrey Horner wrote:
>>> Richard Pearson wrote on 02/06/2008 06:25 AM:
>>>> Hi Thomas
>>> [...]
>>>> With databases, one issue that might be relevant is whether you 
>>>> want to store data in tables (e.g. one table to store one 
>>>> data.frame) that can subsequently be manipulated in the DB, or to 
>>>> store R objects as R objects (e.g. as BLOBs). My situation is 
>>>> likely to be the later case, and one of my concerns is that many 
>>>> DBs have an upper limit of 2GB on BLOBs, and I might potentially 
>>>> have objects that are larger than this.
>>> [...]
>>>
>>> I'd be curious as to why you'd want to store and retrieve R objects 
>>> from a BLOB column in a table. I've often thought about this, but 
>>> unfortunately neither the DBI package nor the RODBC package support 
>>> this.
>>>
>>> Jeff
>
>




More information about the R-sig-DB mailing list