[R-sig-DB] Rdbi package plus draft proposal (was Re: Rdbi package)

David James dj @end|ng |rom re@e@rch@be||-|@b@@com
Sat Oct 6 06:04:47 CEST 2001


Kurt Hornik wrote:
> >>>>> David James writes:
> 
> > Kurt Hornik wrote:
> >> >>>>> M Edward Borasky writes:
> >> 
> >> > Sounds good to me. At some point in the near future I may get a chance
> >> > to test it on a Windows box with MS Access; I've been using RODBC for
> >> > that.
> >> 
> >> Ok.  Will wait till tomorrow morning to allow for further reactions.
> 
> > Not that I anticipate any major issues, but could we wait until the
> > end of this week?  I'd like to check a couple of things (one that
> > Torsten brought to my attention last week and has to do with data
> > conversions, and portability to Splus being the other).
> 
> End of this week is fine.
> 
> -k

Hi,

Well, earlier this week I committed to taking a look at the Rdbi
package and to producing a first draft for the common database
interface (which I'm attaching as two text files: DBI.tex and
figure1.eps).

The Rdbi package looks quite reasonable to me.  It defines most of
the core functionality we need, and I think that it wouldn't be too
hard to make existing packages be compatible with it.  I also have
some suggestions and comments (in no particular order).

1. R/Splus compatibility.

   From an R/Splus compatibility point of view, I'd suggest
   renaming Rdbi to something neutral.  Initially I had thought 
   as RSdbi, but why not simply call it DBI -- and thus friendly
   to both.

2. I would add 2 more generic functions 

       dbRemoveTable(con, tbl.name, ...)
       dbExistsTable(con, tbl.name,...)  

   in addition to the dbReadTable, dbWriteTable, etc. 

   The reason is that with these 2 we would have all the methods
   require to be able to attach() any DBMS to the search path (thus
   would allow us to simply attach(conn) and treat the remote tables
   as simple data.frames).  R does not yet provide facilities for
   attaching arbitrary databases, but I believe that Duncan (and John?) 
   will be doing this soon(?).

3. I would add another class to group the various driver classes
   (RODBC, RPgSQL, etc.)

4. Result class.

   Rdbi defines only one class Rdbi.result to represent the result
   of a query, yet there are 2 types of queries -- those that produce
   output and those that don't.  I can see that for simplicity having
   only one is advantageous, but perhaps we should distinguish them
   (the result of SELECT-like queries that produce output could
   extend the results that don't).

5. I would replace the generic functions

         dbConnectionInfo(conn)
         dbResultInfo(res)
   
    with a single dbGetInfo(obj), say, that dispatches on
    connections, results (and drivers, see 3. above) to retrieve
    meta-data for those objects.

    (Probably I would leave dbColumnInfo(res) as defined in the
    Rdbi package to explicitly extract meta-data from the individual
    columns of the result set.)

6. dbReconnect() 

   From the comments in the Rdbi code, it appears that it's meant to
   take a DBMS connection object (even from a previous session) and
   re-instantiate the (broken) connection.  It seems to me that this
   would require storing passwords in the regular .RData or .Data.
   From a security point of view, I'm uncomfortable doing this.
   Is there a way of re-connecting without saving passwords in
   obvious places?  (I guess that this is more of an implementation
   issue?)

7. Exceptions.

   We need a dbGetException(conn) generic to report if and when
   there are errors in a DBMS connection "conn";  if there has been
   an error, it should report the code (number) and/or message.

8. Version information.

   I think we want to ensure that packages that implement the DBI
   report what version of the DBI they implements -- this could
   easily be reported by the dbGetInfo() method of the driver. (We
   may even require that each driver implementation reports its
   version number.)

9. Common utilities.

   Most packages that will implement the DBI will need some
   common utilities, e.g., functions to map R/S names to SQL
   valid identifiers, to "guess" the SQL type of R/S objects, etc.
   We should provide these common methods in the DBI (of course,
   individual packages may choose to extend them or completely
   override them).

10. Data mappings.

    As Tim has already pointed out, this needs more thought. The
    packages RJava, RPerl, RPython (and the netscape R plugin)
    have dealt with this issue (just like the R-Excel,R-DCOM, etc.)
    We may want to implement the same converters ideas in some
    of those?

11. Multiple instance of connections, result sets, etc.

    The Rdbi package is silent re: allowing multiple DBMS object
    simultaneously, i.e., multiple open drivers, DBMS connections,
    result sets (RODBC, for instance, already implement multiple
    connections).
 
    I would suggest that the DBI should be explicit about
    the possibility for having multiple instances of any of
    these object.  Of course, each DBMS driver should be free to
    implement either single or multiple instances as it sees fit.
    We then should define a couple of generic functions in the
    spirit of getConnections (available both in R and Splus),
    but for DBI objects, etc.

      dbGetConnections(drv)     # return connections on driver drv
      dbGetResults(conn)        # return result sets on connection conn
      
    these could return a container (i.e., a list) of connections and
    result objects, respectively (possibly with only one object).

12. At one point we had thought about making the DBMS interface 
    general enough so that non-relational DBMS could be covered 
    under the same API (HDF5 comes to mind, but also Berkeley DB).
    Is this reasonable?  Or are they so different to relational DBMS 
    that it's  just not feasible?  Or is it us that we have been too
    narrowly focus?  Somebody that's using non-relational should take
    a closer look and let us know....


Now, let me move on to the draft paper on a common interface.

Based on the Rdbi package, the above points, and previous
discussions in various lists/forums, I've put together a rough
draft that attempts to flush out in enough detail a framework for
implementing the interface.  I hope that after a few iterations,
it can be used as a guide by DBMS package developers.

I found that describing the interface in terms of classes is a lot
more natural (and easier) than in terms of (only) generic functions and
methods.  Moreover, as I looked at other DB API's (Python, Perl, Java)
I felt more strongly that perhaps we should be using version 4
class-based instead of traditional version 3 object-based programming.
With the current library(methods) in R-devel in quite a reasonable shape, 
I think the DBI is an ideal candidate project for using more advanced
programming tools.  

So, let me throw in the idea of building the DBI using version 4
style classes.  I believe that most of the (current) Rdbi could
be re-used, and I'd volunteer to migrate it to the version 4 style.

Regards,

-- 
David A. James
Statistics Research, Room 2C-253            Phone:  (908) 582-3082       
Bell Labs, Lucent Technologies              Fax:    (908) 582-3340
Murray Hill, NJ 09794-0636

-------------- next part --------------
A non-text attachment was scrubbed...
Name: DBI.tex
Type: application/x-tex
Size: 23865 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-db/attachments/20011006/87f9f84b/attachment.tex>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: figure1.eps
Type: application/postscript
Size: 8110 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-db/attachments/20011006/87f9f84b/attachment.eps>


More information about the R-sig-DB mailing list