[R-sig-DB] Storing R objects (was [R] advice requested re: building "good" system (R, SQL db) for handling large datasets)

Sean Davis @d@v|@2 @end|ng |rom m@||@n|h@gov
Thu Feb 7 13:56:57 CET 2008


On Feb 7, 2008 7:16 AM, Richard Pearson
<richard.pearson using postgrad.manchester.ac.uk> wrote:
> (moved to R-sig-db from R-help)
>
> Jeff,
>
> I have a project where I want to create large numbers of large, complex
> objects (e.g. bioconductor ExpressionSet objects). I want to store these
> along with metadata (such as what raw data and parameters were used to
> create the object). I will later want to access subsets of these
> objects, with the subset specified by a query. It seems to me the
> natural way to do this would be to store the metadata and the objects
> themselves in database tables, and I have assumed that the objects would
> need to be serialised and stored as BLOBs. It sounds like at present
> there are no plans for infrastructure that would allow me to do this,
> but I would be interested to know if anyone plans to make such a
> scenario possible in the future.
>
> I am assuming in the above that it is not possible to store arbitrarily
> complex R objects in a DB, without a lot of work coercing all the
> various slots in the object to data.frames, and saving the data.frames
> to different tables. I've had a quick scan through the documentation for
> DBI, RODBC, RMySQL and ROracle, but couldn't see any such functionality.
>
> An alternative for my situation would be to store the R objects as files
> (using save) and store the metadata and filenames in a DB, but this
> seems to me to add an extra layer of complexity/maintenance. Finally, I
> could of course save everything as files, but one of the reasons for
> storing things in a DB is because I would like to create dynamic web
> pages linked to metadata and results data in the DB.

This type of application comes up often in web design.  The general
thinking is that storing objects (such as images, etc.) on the disk is
just fine.  I would think that you would want to create functions
like:

queryMetadata() # returns a list of ExpressionSet keys
fetchExprSets() # takes a list of ExpressionSet keys and returns a
list of ExpressionSets
storeExprSetAndMetadata() #take an ExpressionSet, stores it, and
returns the associated unique key
....

These would allow you the flexibility of changing underlying storage
mechanisms as you go along to whatever you like without changing the
business code.  The concept of keeping the data model separate from
the rest of the code (that which controls the web application itself)
is one of the key concepts underlying the Model-View-Controller (MVC)
model of application design.

In practical terms, it seems that since R automatically serializes
objects efficiently and in a compressed format it would be appropriate
to use that mechanism as a first pass; it could be later modified if
necessary.

Just my $0.02 worth.

Sean




More information about the R-sig-DB mailing list