[R-sig-DB] Rdbi package plus draft proposal (was Re: Rdbi package)L

Timothy H. Keitt tk||@t@ddr @end|ng |rom ke|tt|@b@b|o@@uny@b@edu
Mon Oct 8 21:19:53 CEST 2001


David James wrote:

>
>Hi,
>
>Well, earlier this week I committed to taking a look at the Rdbi
>package and to producing a first draft for the common database
>interface (which I'm attaching as two text files: DBI.tex and
>figure1.eps).
>
>The Rdbi package looks quite reasonable to me.  It defines most of
>the core functionality we need, and I think that it wouldn't be too
>hard to make existing packages be compatible with it.  I also have
>some suggestions and comments (in no particular order).
>
>1. R/Splus compatibility.
>
>   From an R/Splus compatibility point of view, I'd suggest
>   renaming Rdbi to something neutral.  Initially I had thought 
>   as RSdbi, but why not simply call it DBI -- and thus friendly
>   to both.
>

Agreed.

>
>
>2. I would add 2 more generic functions 
>
>       dbRemoveTable(con, tbl.name, ...)
>       dbExistsTable(con, tbl.name,...)  
>
>   in addition to the dbReadTable, dbWriteTable, etc. 
>

Yes, these are needed. I must have forgotten to move them over from RPgSQL.

>
>
>   The reason is that with these 2 we would have all the methods
>   require to be able to attach() any DBMS to the search path (thus
>   would allow us to simply attach(conn) and treat the remote tables
>   as simple data.frames).  R does not yet provide facilities for
>   attaching arbitrary databases, but I believe that Duncan (and John?) 
>   will be doing this soon(?).
>
This has been my dream all along---to be able to attach a database and 
use the native R interface to access the data. I'd be very excited to 
see some general hooks into the 'attach' command. Even still, this is 
actually not at all simple to implement without write access to the 
database. Also, as I've learned SQL, I find I want to run queries 
directly rather than use R on proxies. Anyway, it would be a nice feature.

>
>
>3. I would add another class to group the various driver classes
>   (RODBC, RPgSQL, etc.)
>
I used class 'PgSQL' in Rdbi.PgSQL, but the names are arbitrary.

>
>
>4. Result class
>
>   Rdbi defines only one class Rdbi.result to represent the result
>   of a query, yet there are 2 types of queries -- those that produce
>   output and those that don't.  I can see that for simplicity having
>   only one is advantageous, but perhaps we should distinguish them
>   (the result of SELECT-like queries that produce output could
>   extend the results that don't).
>

Interesting. I hadn't considered this. I don't see it as immediately 
usefull. Perhaps some mock code examples would help.

>
>
>5. I would replace the generic functions
>
>         dbConnectionInfo(conn)
>         dbResultInfo(res)
>   
>    with a single dbGetInfo(obj), say, that dispatches on
>    connections, results (and drivers, see 3. above) to retrieve
>    meta-data for those objects.
>
>    (Probably I would leave dbColumnInfo(res) as defined in the
>    Rdbi package to explicitly extract meta-data from the individual
>    columns of the result set.)
>

Agreed.

>
>
>6. dbReconnect() 
>
>   From the comments in the Rdbi code, it appears that it's meant to
>   take a DBMS connection object (even from a previous session) and
>   re-instantiate the (broken) connection.  It seems to me that this
>   would require storing passwords in the regular .RData or .Data.
>   From a security point of view, I'm uncomfortable doing this.
>   Is there a way of re-connecting without saving passwords in
>   obvious places?  (I guess that this is more of an implementation
>   issue?)
>

Perhaps we should just leave this out for now. I once thought it would 
be nice if table proxies could reinitialize their connections 
automatically, but it not essential at this time.

>
>
>7. Exceptions.
>
>   We need a dbGetException(conn) generic to report if and when
>   there are errors in a DBMS connection "conn";  if there has been
>   an error, it should report the code (number) and/or message.
>

In Rdbi, these are available using dbConnectionInfo(conn). Do we need a 
seperate function? (I'm not saying we don't.)

>
>
>8. Version information.
>
>   I think we want to ensure that packages that implement the DBI
>   report what version of the DBI they implements -- this could
>   easily be reported by the dbGetInfo() method of the driver. (We
>   may even require that each driver implementation reports its
>   version number.)
>

Agreed.

>
>
>9. Common utilities.
>
>   Most packages that will implement the DBI will need some
>   common utilities, e.g., functions to map R/S names to SQL
>   valid identifiers, to "guess" the SQL type of R/S objects, etc.
>   We should provide these common methods in the DBI (of course,
>   individual packages may choose to extend them or completely
>   override them).
>

I put these in util.R in Rdbi.

>
>
>10. Data mappings.
>
>    As Tim has already pointed out, this needs more thought. The
>    packages RJava, RPerl, RPython (and the netscape R plugin)
>    have dealt with this issue (just like the R-Excel,R-DCOM, etc.)
>    We may want to implement the same converters ideas in some
>    of those?
>

Unless we are prepared to store metadata in seperate tables (requires 
write access to the database), we're pretty much limited to dealing with 
unique types defined in the database. My solution (types.R in 
Rdbi.PgSQL) was a function that returns the SQL type given an R object, 
a function that takes an R object and creates a corresponding SQL 
formatted string and a function that takes an SQL formatted string and 
converts it back to the R type. These could be made into generic methods.

>
>
>11. Multiple instance of connections, result sets, etc.
>
>    The Rdbi package is silent re: allowing multiple DBMS object
>    simultaneously, i.e., multiple open drivers, DBMS connections,
>    result sets (RODBC, for instance, already implement multiple
>    connections).
>

This was true in RPgSQL, but not Rdbi. In Rdbi, you can have as many 
'conn' and 'result' objects open as resources permit. I don't have the 
concept of a driver object, because I'm not convinced its needed.

>
> 
>    I would suggest that the DBI should be explicit about
>    the possibility for having multiple instances of any of
>    these object.  Of course, each DBMS driver should be free to
>    implement either single or multiple instances as it sees fit.
>    We then should define a couple of generic functions in the
>    spirit of getConnections (available both in R and Splus),
>    but for DBI objects, etc.
>
>      dbGetConnections(drv)     # return connections on driver drv
>      dbGetResults(conn)        # return result sets on connection conn
>      
>    these could return a container (i.e., a list) of connections and
>    result objects, respectively (possibly with only one object).
>

Its reasonable, but is it essential? Would anybody use these functions?

>
>
>12. At one point we had thought about making the DBMS interface 
>    general enough so that non-relational DBMS could be covered 
>    under the same API (HDF5 comes to mind, but also Berkeley DB).
>    Is this reasonable?  Or are they so different to relational DBMS 
>    that it's  just not feasible?  Or is it us that we have been too
>    narrowly focus?  Somebody that's using non-relational should take
>    a closer look and let us know....
>

This is a good point.

>
>
>
>Now, let me move on to the draft paper on a common interface.
>
>Based on the Rdbi package, the above points, and previous
>discussions in various lists/forums, I've put together a rough
>draft that attempts to flush out in enough detail a framework for
>implementing the interface.  I hope that after a few iterations,
>it can be used as a guide by DBMS package developers.
>
>I found that describing the interface in terms of classes is a lot
>more natural (and easier) than in terms of (only) generic functions and
>methods.  Moreover, as I looked at other DB API's (Python, Perl, Java)
>I felt more strongly that perhaps we should be using version 4
>class-based instead of traditional version 3 object-based programming.
>With the current library(methods) in R-devel in quite a reasonable shape, 
>I think the DBI is an ideal candidate project for using more advanced
>programming tools.  
>
>So, let me throw in the idea of building the DBI using version 4
>style classes.  I believe that most of the (current) Rdbi could
>be re-used, and I'd volunteer to migrate it to the version 4 style.
>

Agreed. Does R yet support v.4 style classes? Is there a good 
reference/tutorial?

>
>Regards,
>
> DBI.tex
>
> Content-Type:
>
> application/x-tex
>
>
> ------------------------------------------------------------------------
> figure1.eps
>
> Content-Type:
>
> application/postscript
>
>

-- 
Timothy H. Keitt
Department of Ecology and Evolution
State University of New York at Stony Brook
Stony Brook, New York 11794 USA
Phone: 631-632-1101, FAX: 631-632-7626
http://life.bio.sunysb.edu/ee/keitt/






More information about the R-sig-DB mailing list