[R-sig-DB] RSQLite: ATTACH statement not executed when the db connection is holding a resultSet

Sat Jan 19 19:32:33 CET 2008

Hi Herve,

Herve Pages <hpages using fhcrc.org> writes:
> No use-case. But we have no way no prevent our users to make improper
> use of dbSendQuery() (we'll try to advertise the use of much safer
> dbGetQuery() instead but that's all we can do).

At present, dbGetQuery is IMO less safe, or at least more confusing
(see more below).

> Yes I've seen this. And if the first result set does not correspond to a
> SELECT statement, then it is silently replaced by the next result
> set:

Almost.  If my reading of the code is correct, a ResultSet contains a
flag indicating whether or not is is complete.  For a SELECT query,
complete set to FALSE means there are more rows to fetch.  For
non-SELECT queries it is assumed that complete is TRUE right away
(there is a comment in the code that this might cause a problem if
there are asynchronous queries, but I don't think this is an issue
with SQLite).

When RSQLite attempts to execute SQL, it checks for an existing
ResultSet and _closes is automatically_ if its complete flag is TRUE.
If the existing ResultSet is not complete, an error is thrown.

> Personally I find this "automatic result set clearing" feature a little
> bit confusing and it raises some concerns about compatibility with other
> RDBMSs (see below).

I agree that this can cause confusion and that using this "feature"
defeats the notion of DBI (being able to swap RDBMS's without changing
much code).

But I didn't invent this feature and don't have any sense of whether
or not people are relying on it :-\

> Back to the ATTACH problem: if I try to ATTACH with dbSendQuery() instead
> of dbGetQuery() then the "automatic result set clearing" feature enter in
> action:

Yes.  But what is confusing is what dbGetQuery does.  The semantics of
dbGetQuery are not what they seem.  When a ResultSet already exists,
dbGetQuery creats a new _connection_ and executes there.  SQL
statements like ATTACH that are called essentially for their
side-effect on the connection are lost.  In your example, the ATTACH
works just fine, but occurs on a new (transient) connection.  If
instead of dbGetQuery, you had used dbSendQuery, things would have
worked.

Here's my example:

  library("RSQLite")
  db1 <- dbConnect(SQLite(), "db1.sqlite")
  data(USArrests)
  dbWriteTable(db1, "ua", USArrests, row.names=FALSE)

  db2 <- dbConnect(SQLite(), "db2.sqlite")
  res <- dbSendQuery(db2, "CREATE TABLE t1 (b integer, bb text)")

  ## this does the ATTACH on a transient connection :-(
  dbGetQuery(db2, "ATTACH 'db1.sqlite' AS db1")
  ## Hence, this fails:
  dbGetQuery(db2, "select * from db1.ua limit 3")

  ## redo ATTACH... this works because ResultSet res is complete
  ## and so it can be auto-closed.
  res2 <- dbSendQuery(db2, "ATTACH 'db1.sqlite' AS db1")

  ## Note that dbGetQuery _still_ won't work because now
  ## res2 is open (although complete) and the new transient connection
  ## won't have the ATTACH'ed DB.
  dbGetQuery(db2, "select * from db1.ua limit 3")

  ## But using dbSendQuery, we get the auto-close behavior
  res3 <- dbSendQuery(db2, "select * from db1.ua limit 3")
  fetch(res3)

  ## And if we close the ResultSet, dbGetQuery behaves as "expected".
  dbClearResult(res3)
  dbGetQuery(db2, "select * from db1.ua limit 3")

>> For result sets that don't return rows, there isn't much good that you
>> can do with the result set.  Perhaps these should be invalidated on
>> return so that you can open another resultSet?
>
> The problem is consistency with other RDBMSs: if the users want to be able
> to write portable code, they need to have some guarantee that things behave
> consistently between RSQLite, RMySQL, ROracle, etc...
> In this regard I would say it's better to keep things as predictable as
> possible and maybe to avoid features like the "automatic result set clearing".
> Every successful call to dbSendQuery creates a new entry in the result set
> table and this entry remains here until explicitly removed by the user,
> whatever the nature of the SQL statement was. If SQLite allows only one
> result set per connection, then an error should be raised when dbSendQuery
> is called and there is already an entry in the result set table.

I agree that consistency is valuable and is in large part the point of
DBI.  However, things will never be symmetric since SQLite doesn't
support the same features as other RDBMS's.  Code that happily creates
multiple ResultSets might work find in RMySQL (does it?) would raise
errors with RSQLite.

I guess I see your argument, but the auto-close feature doesn't
prevent you from making explicit dbClearResult calls and having
portable code.

> I can see that the "automatic result set clearing" feature adds some
> convenience by discarding result sets that DO NOT correspond to a SELECT
> statement (so that they never end up in the result set table) but then
> everybody in the DBI family should do the same thing.
>
> As for treating result sets that DO correspond to a SELECT statement
> but contain 0 rows the same way as result sets that DO NOT correspond
> to a SELECT statement, I don't really like the idea. Because the 2 types
> of result sets are fundamentally different: having rows or not in the
> former depends on the data found in the DB while getting a result set
> of the latter type does not (it only depends on a programmatic decision
> i.e. on the nature of the SQL statement that was sent).

I think there are two ways to go here:

1. Fix dbGetQuery to have the same auto-close semantics.

2. Remove the auto-close feature entirely.  Should dbGetQuery change
   too such that it fails when a ResultSet is open instead of opening
   a new connection?  That is easier to explain, but not at all
   backwards compatible.

+ seth

-- 
Seth Falcon | seth using userprimary.net | blog: http://userprimary.net/user/