RFC: large database interface

Egon Schmid eschmid@delos.lf.net
Sun, 19 Apr 1998 13:10:37 +0200


Thomas Lumley wrote:
> 
> I have been playing with a large database interface for R, and have
> written one complete but useless demonstration and one incomplete but
> potentially useful example (with memory mapping of a fixed-format ASCII
> file). The idea is to make the file appear like a matrix or data frame but
> not have to read it into the R heap.
> 
> A description and code can be found at
> http://www.biostat.washington.edu/~thomas/Rdb.html
>                                           Rdb.nw  (noweb literate program)
>                                           Rdb.c
>                                           Rdb.R
> 
> Comments?

Well, there is a web interface through the Apache module PHP Hypertext
Preprocessor. At http://www.php.net/ there are plenty more database
interfaces.

Personaly I think it would a great idea to interface large datasets with
netCDF 

	http://www.unidata.ucar.edu/packages/netcdf

>From the manual '1.2 NetCDF Is Not a Database Management System'

"Why not use an existing database management system for storing
array-oriented data? Relational database software is not suitable for
the kinds of data access supported by the netCDF interface.

First, existing database systems that support the relational model do
not support multidimensional objects (arrays) as a basic unit of data
access. Representing arrays as relations makes some useful kinds of data
access awkward and provides little support for the abstractions of
multidimensional data and coordinate systems. A quite different data
model is needed for array-oriented data to facilitate its retrieval,
modification, mathematical manipulation and visualization.

Related to this is a second problem with general-purpose database
systems: their poor performance on large arrays. Collections of
satellite images, scientific model outputs and long-term global weather
observations are beyond the capabilities of most database systems to
organize and index for efficient retrieval.

Finally, general-purpose database systems provide, at significant cost
in terms of both resources and access performance, many facilities that
are not needed in the analysis, management, and display of
array-oriented data. For example, elaborate update facilities, audit
trails, report formatting, and mechanisms designed for
transaction-processing are unnecessary for most scientific
applications."

On Feb 3 there was a small thread on this mailing list. 

-Egon
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._