[R-sig-DB] Storing R objects (was [R] advice requested re: building "good" system (R, SQL db) for handling large datasets)
Richard Pearson
r|ch@rd@pe@r@on @end|ng |rom po@tgr@d@m@nche@ter@@c@uk
Fri Feb 8 12:51:47 CET 2008
I have perhaps confused the issue by mentioning the web application. The
web application will only be based on small tables of results and
metadata - I will not need any access to the large objects from the web
application. I will however need access to the large objects from R, so
I am thinking about how I should organise the storage of these objects.
I think I will use Sean's fine suggestion (worth far more than $0.02!),
but will store my large objects as files, rather than in the DB.
Many thanks to Jeff, Sean and Dirk for the great replies - much appreciated!
Richard.
Jeffrey Horner wrote:
> Richard Pearson wrote on 02/07/2008 06:16 AM:
>> (moved to R-sig-db from R-help)
>>
>> Jeff,
>>
>> I have a project where I want to create large numbers of large,
>> complex objects (e.g. bioconductor ExpressionSet objects). I want to
>> store these along with metadata (such as what raw data and parameters
>> were used to create the object). I will later want to access subsets
>> of these objects, with the subset specified by a query. It seems to
>> me the natural way to do this would be to store the metadata and the
>> objects themselves in database tables, and I have assumed that the
>> objects would need to be serialised and stored as BLOBs. It sounds
>> like at present there are no plans for infrastructure that would
>> allow me to do this, but I would be interested to know if anyone
>> plans to make such a scenario possible in the future.
>>
>> I am assuming in the above that it is not possible to store
>> arbitrarily complex R objects in a DB, without a lot of work coercing
>> all the various slots in the object to data.frames, and saving the
>> data.frames to different tables. I've had a quick scan through the
>> documentation for DBI, RODBC, RMySQL and ROracle, but couldn't see
>> any such functionality.
>>
>> An alternative for my situation would be to store the R objects as
>> files (using save) and store the metadata and filenames in a DB, but
>> this seems to me to add an extra layer of complexity/maintenance.
>> Finally, I could of course save everything as files, but one of the
>> reasons for storing things in a DB is because I would like to create
>> dynamic web pages linked to metadata and results data in the DB.
>
> Richard, I humbly suggest you actually benchmark how long it takes to
> retrieve a 2GB object from the filesystem into R. Then, add the time
> it takes to subset the object and print it on the console. Now, add
> the overhead of constructing the web page of that subset. Will the
> users of your web application wait that long for their results? Now
> swap out the filesystem and place the objects in the DB; that's
> obviously be slower, right?
>
> Consider splitting your objects into a coherent db schema and only
> pull into R, or a web page, the parts that you want to analyze and
> display.
>
> Jeff
>
>>
>> Best wishes
>>
>> Richard.
>>
>>
>> Jeffrey Horner wrote:
>>> Richard Pearson wrote on 02/06/2008 06:25 AM:
>>>> Hi Thomas
>>> [...]
>>>> With databases, one issue that might be relevant is whether you
>>>> want to store data in tables (e.g. one table to store one
>>>> data.frame) that can subsequently be manipulated in the DB, or to
>>>> store R objects as R objects (e.g. as BLOBs). My situation is
>>>> likely to be the later case, and one of my concerns is that many
>>>> DBs have an upper limit of 2GB on BLOBs, and I might potentially
>>>> have objects that are larger than this.
>>> [...]
>>>
>>> I'd be curious as to why you'd want to store and retrieve R objects
>>> from a BLOB column in a table. I've often thought about this, but
>>> unfortunately neither the DBI package nor the RODBC package support
>>> this.
>>>
>>> Jeff
>
>
More information about the R-sig-DB
mailing list