[R-sig-DB] Rdbi package plus draft proposal (was Re: Rdbi package)
David James
dj @end|ng |rom re@e@rch@be||-|@b@@com
Sat Oct 6 06:04:47 CEST 2001
Kurt Hornik wrote:
> >>>>> David James writes:
>
> > Kurt Hornik wrote:
> >> >>>>> M Edward Borasky writes:
> >>
> >> > Sounds good to me. At some point in the near future I may get a chance
> >> > to test it on a Windows box with MS Access; I've been using RODBC for
> >> > that.
> >>
> >> Ok. Will wait till tomorrow morning to allow for further reactions.
>
> > Not that I anticipate any major issues, but could we wait until the
> > end of this week? I'd like to check a couple of things (one that
> > Torsten brought to my attention last week and has to do with data
> > conversions, and portability to Splus being the other).
>
> End of this week is fine.
>
> -k
Hi,
Well, earlier this week I committed to taking a look at the Rdbi
package and to producing a first draft for the common database
interface (which I'm attaching as two text files: DBI.tex and
figure1.eps).
The Rdbi package looks quite reasonable to me. It defines most of
the core functionality we need, and I think that it wouldn't be too
hard to make existing packages be compatible with it. I also have
some suggestions and comments (in no particular order).
1. R/Splus compatibility.
From an R/Splus compatibility point of view, I'd suggest
renaming Rdbi to something neutral. Initially I had thought
as RSdbi, but why not simply call it DBI -- and thus friendly
to both.
2. I would add 2 more generic functions
dbRemoveTable(con, tbl.name, ...)
dbExistsTable(con, tbl.name,...)
in addition to the dbReadTable, dbWriteTable, etc.
The reason is that with these 2 we would have all the methods
require to be able to attach() any DBMS to the search path (thus
would allow us to simply attach(conn) and treat the remote tables
as simple data.frames). R does not yet provide facilities for
attaching arbitrary databases, but I believe that Duncan (and John?)
will be doing this soon(?).
3. I would add another class to group the various driver classes
(RODBC, RPgSQL, etc.)
4. Result class.
Rdbi defines only one class Rdbi.result to represent the result
of a query, yet there are 2 types of queries -- those that produce
output and those that don't. I can see that for simplicity having
only one is advantageous, but perhaps we should distinguish them
(the result of SELECT-like queries that produce output could
extend the results that don't).
5. I would replace the generic functions
dbConnectionInfo(conn)
dbResultInfo(res)
with a single dbGetInfo(obj), say, that dispatches on
connections, results (and drivers, see 3. above) to retrieve
meta-data for those objects.
(Probably I would leave dbColumnInfo(res) as defined in the
Rdbi package to explicitly extract meta-data from the individual
columns of the result set.)
6. dbReconnect()
From the comments in the Rdbi code, it appears that it's meant to
take a DBMS connection object (even from a previous session) and
re-instantiate the (broken) connection. It seems to me that this
would require storing passwords in the regular .RData or .Data.
From a security point of view, I'm uncomfortable doing this.
Is there a way of re-connecting without saving passwords in
obvious places? (I guess that this is more of an implementation
issue?)
7. Exceptions.
We need a dbGetException(conn) generic to report if and when
there are errors in a DBMS connection "conn"; if there has been
an error, it should report the code (number) and/or message.
8. Version information.
I think we want to ensure that packages that implement the DBI
report what version of the DBI they implements -- this could
easily be reported by the dbGetInfo() method of the driver. (We
may even require that each driver implementation reports its
version number.)
9. Common utilities.
Most packages that will implement the DBI will need some
common utilities, e.g., functions to map R/S names to SQL
valid identifiers, to "guess" the SQL type of R/S objects, etc.
We should provide these common methods in the DBI (of course,
individual packages may choose to extend them or completely
override them).
10. Data mappings.
As Tim has already pointed out, this needs more thought. The
packages RJava, RPerl, RPython (and the netscape R plugin)
have dealt with this issue (just like the R-Excel,R-DCOM, etc.)
We may want to implement the same converters ideas in some
of those?
11. Multiple instance of connections, result sets, etc.
The Rdbi package is silent re: allowing multiple DBMS object
simultaneously, i.e., multiple open drivers, DBMS connections,
result sets (RODBC, for instance, already implement multiple
connections).
I would suggest that the DBI should be explicit about
the possibility for having multiple instances of any of
these object. Of course, each DBMS driver should be free to
implement either single or multiple instances as it sees fit.
We then should define a couple of generic functions in the
spirit of getConnections (available both in R and Splus),
but for DBI objects, etc.
dbGetConnections(drv) # return connections on driver drv
dbGetResults(conn) # return result sets on connection conn
these could return a container (i.e., a list) of connections and
result objects, respectively (possibly with only one object).
12. At one point we had thought about making the DBMS interface
general enough so that non-relational DBMS could be covered
under the same API (HDF5 comes to mind, but also Berkeley DB).
Is this reasonable? Or are they so different to relational DBMS
that it's just not feasible? Or is it us that we have been too
narrowly focus? Somebody that's using non-relational should take
a closer look and let us know....
Now, let me move on to the draft paper on a common interface.
Based on the Rdbi package, the above points, and previous
discussions in various lists/forums, I've put together a rough
draft that attempts to flush out in enough detail a framework for
implementing the interface. I hope that after a few iterations,
it can be used as a guide by DBMS package developers.
I found that describing the interface in terms of classes is a lot
more natural (and easier) than in terms of (only) generic functions and
methods. Moreover, as I looked at other DB API's (Python, Perl, Java)
I felt more strongly that perhaps we should be using version 4
class-based instead of traditional version 3 object-based programming.
With the current library(methods) in R-devel in quite a reasonable shape,
I think the DBI is an ideal candidate project for using more advanced
programming tools.
So, let me throw in the idea of building the DBI using version 4
style classes. I believe that most of the (current) Rdbi could
be re-used, and I'd volunteer to migrate it to the version 4 style.
Regards,
--
David A. James
Statistics Research, Room 2C-253 Phone: (908) 582-3082
Bell Labs, Lucent Technologies Fax: (908) 582-3340
Murray Hill, NJ 09794-0636
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DBI.tex
Type: application/x-tex
Size: 23865 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-db/attachments/20011006/87f9f84b/attachment.tex>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: figure1.eps
Type: application/postscript
Size: 8110 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-db/attachments/20011006/87f9f84b/attachment.eps>
More information about the R-sig-DB
mailing list