[R-sig-DB] general status query on DBI etc

David James dj @end|ng |rom re@e@rch@be||-|@b@@com
Sun Jul 21 05:16:30 CEST 2002


Vincent J. Carey, Jr. wrote:
> 
> Here's my understanding of the R-RDBMS interface situation,
> after poking around CRAN and the SIG page/mailing list.
> 
> 1) DBI is available on CRAN but it is not clear if any
> compliant drivers are on hand; of particular interest
> are postgres and oracle drivers

Correct.  As of today no driver is compliant.

> 
> 2) Rdbi is available on sourceforge; Rdbi.PgSQL is
> also available there and the latter provides the postgres
> driver for the Rdbi (not DBI) API.  Tim Keitt indicates
> that all the C code that would be necessary for a postgres
> driver for DBI is present in Rdbi.PgSQL, but DBI-compliant
> R code needs to be written
> 
> 3) ROracle is available on CRAN, and 0.3-3 DESCRIPTION
> identifies it as transitional; the next version will
> satisfy the DBI API.

Yes, those plans are current.

> 
> 4) RODBC is at CRAN devel (not on the main package distribution
> page) and the RS-DBI.pdf document suggests that unix ODBC
> is inadequately developed.  Based on my limited web searching,
> ODBC does not seem a viable approach for unix at present.

Hmm, that was written sometime ago.  The unixODBC project has
been making quite a bit of progress, and the driver manager 
seems to be getting quite good (although I haven't tested it too
much).  My main concern back then was the lack of ODBC drivers.
Currently the open source DBMS (at least PostsgreSQL and MySQL)
seem to provide quite decent ODBC drivers.  The availability of
free/open source drivers on Linux/Unix/MacOS for Oracle, MS SQL
server, and others is still an issue, AFAIK; (there are good
commercial drivers for Unix, though).

My feeling is that an R-ODBC interface is critical -- certainly
on Windows, but also on Unix and probably on Mac (I'm not sure how
Mac deals w. DBMSs).

> 
> Note: Bioconductor has made substantial use of RPgSQL for
> databasing genomic annotation data.  The fact that RPgSQL has
> been abandoned by its maintainer is a source of concern.
> We are starting to strategize on the problem of storing and
> analyzing large quantities of expression data in RDBMS and
> we look to the DB-SIG for guidance on resources related
> to this problem.
> 
> Questions
> 
> 1) How far off is the DBI-compliant ROracle?  Are there 
> risks that code developed using the transitional version will
> require substantial reworking when the new version emerges?

The current ROracle (same for RMySQL and, of less interest, RSQLite) 
is fairly close to the DBI.  All the functionality in the
DBI is available in these other packages.  From the user's point
of view, the only difference is the function names (close(con)
vs dbDisconnect(con), etc.).  These do not require very extensive
work to have them compliant with the DBI.  At the programming level
the issue is also straightforward -- the current implementations
are done using S3 style classes and probably should be migrated
to S4 classes.  (Somewhat ironically, both the Oracle and MySQL
interfaces were originally implemented in S4, ported backwards
to S3 style classes for R compatibility and now they can finally be
fully implemented with S4 classes.)   In terms of programming, I
think we're talking about a week's effort, or less. 

> 
> 2) Is anyone working on the postgres driver for DBI?
> Apparently most of the C code is available.

When implementing a driver, the C portion is probably where the
most work is required.  I'm not very familiar with this code, but
perhaps moving to the DBI wouldn't be too difficult.  

Last december I wrote DBI interfaces on top of both the existing
RPgSQL and RODBC, but I thought (and Tim agreed with me) that
the resulting layering of S4 on top of S3 classes wasn't ideal,
so these DBI.RpSQL and DBI.RODBC packages were not made public.

> 
> 3) RS-DBI.pdf suggests a number of alternative architectures
> (e.g., ODBC, JDBC).  Is the slow emergence of drivers for DBI
> ascribable to uncertainty about the long-term viability of
> the DBI approach?  Has RSJava matured to the point where
> one might prefer a JDBC-centered approach?

Re: ODBC, see my comments above.

Re: R/JDBC, I'm not sure -- I have little experience with Java, so perhaps
others can comment.

The slow emergence of drivers for the DBI, in my opinion, is
the lack of volunteers.  Even when we were drafting the DBI the
participation was not exactly overwhelming, as you probably noticed
when you looked at the r-sig-db archives.  But I feel that it is
important to have a common interface to DBMS, one reason being
to abstract out (at least as far as R is concerned) the details
of getting your data into your analysis.  Thus in the long run we
should be deciding what DBMS to use based on their merits and not
on whether the R API to DBMS "A" is well thought out but not for "B"
(asumming they provide more or less similar functionality).

Moreover, I think that Perl's DBI/DBD, Java's JDBC, ODBC, and
Python's DB-API have proved the viability of this approach.
Of course, any flaws in the R/S DBI should be fixed -- but that's
just implementation:-)

> 
> Thanks
> -- 
> ---
> Vince Carey, PhD
> Ass't Prof Med (Biostatistics)
> Harvard Medical School
> Channing Laboratory - ph 6175252265 fa 6177311541 cell 8572126768
> 181 Longwood Ave Boston MA 02115 USA
> 
> stvjc using channing.harvard.edu
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using stat.math.ethz.ch
> http://www.stat.math.ethz.ch/mailman/listinfo/r-sig-db

-- 
David A. James
Statistics Research, Room 2C-253            Phone:  (908) 582-3082       
Bell Labs, Lucent Technologies              Fax:    (908) 582-3340
Murray Hill, NJ 09794-0636



More information about the R-sig-DB mailing list