[R-sig-DB] dbWriteTable() is renaming the 'end' column

Hervé Pagès hp@ge@ @end|ng |rom |hcrc@org
Wed Sep 30 08:39:19 CEST 2009


Prof Brian Ripley wrote:
> I presume this is about dbWriteTable in package DBI, and also RSQLite, 
> but neither are mentioned and the 'example' fails to run as written.
> 
> On Tue, 29 Sep 2009, Hervé Pagès wrote:
> 
>> Gabor Grothendieck wrote:
>>> end is a reserved SQL keyword.
>>
>> Ok, maybe so what?
>>
>> This works:
>>
>>  dbSendQuery(con, "CREATE TABLE mydata (start INTEGER, end INTEGER)")
>>
>> and if it didn't, I should get an error (or at least a warning).
>> Then I could try again with:
>>
>>  dbSendQuery(con, "CREATE TABLE mydata (`start` INTEGER, `end` INTEGER)")
>>
>> Using backquotes allows me to use whatever name I want for my columns, 
>> even SQL keywords. This is ANSI SQL.
> 
> Without telling us *which* ANSI SQL standard this is unclear, but I 
> believe that it is a false claim.  It seems so do the authors of MySQL, 
> who have an ANSI_QUOTE mode using " to quote identifiers.  As I know 
> from RODBC, this is one of those areas of differences between actual SQL 
> engines, and one which ODBC does not attempt to paper over.

You are right that in the ANSI SQL standard, an SQL delimited identifier
is delimited by double quotes, not backquotes, sorry.
Anyway my point is that, among other possibilities, dbWriteTable() could
send a CREATE statement with delimited identifiers, so SQL keywords are
not an issue anymore. This is only one possibility. There are others,
like just returning an error if the column name is a keyword, or
allowing the user disable the mangling, etc. Anything
sounds better than silently mangling the column names.

> 
> See also http://www.mail-archive.com/sqlite-users@sqlite.org/msg09583.html
> -- so it looks like backquotes were added to SQLite in 2005, long after 
> the design of DBI/RSQLite, and as a non-ANSI extension.

According to

   SQLKeywords(dbDriver("SQLite"))

"end" doesn't seem to be an SQLite keyword. Besides, sending the
CREATE statement

   CREATE TABLE mydata (start INTEGER, end INTEGER)

with dbSendQuery() just works. And finally:

   dbWriteTable(con, "end", mydata)

doesn't bother to mangle the name of the "end" *table*.
So it's really hard to understand why it bothers to mangle the
name of the "end" *column*.

> 
> RODBC successfully uses double quotes for identifiers with SQLite, and 
> the RODBC version of your example works as you expected.

Always good to know, thanks!

> 
>> IMO, it would be much better if dbWriteTable() was:
>>  - not trying to be too smart and just send the CREATE statement,
>>    leaving the DB backend decide whether it's valid or not, and
>>    eventually return an error,
>>  - if dbWriteTable() really wants to be smart, then it could try
>>    a little bit harder to create the table I want with the column
>>    names I want by using backquotes.
> 
> But DBI::dbWriteTable() does neither of these: it calls the method in 
> the backend, which in the case of RSQLite is sqliteWriteTable(). That 
> calls RSQLite::dbBuildTableDefinition() to make the CREATE TABLE query, 
> and that calls DBI::make.db.names(), which is a generic with a method in 
> RSQLite.
> 
> So the correct subject of your scorn would seem to be 
> RSQLite::dbBuildTableDefinition() and its lack of documentation. As I 
> have argued earlier, I think that the solution has to be specific to the 
> backend.

Thanks for all the details. Knowing the details of the call stack is
interesting... but does it really matter here? The end user
calls dbWriteTable() in order to write a table to the DB, not
sqliteWriteTable(), or RSQLite::dbBuildTableDefinition(), or
DBI::make.db.names().

Cheers,
H.

 > sessionInfo()
R version 2.10.0 Under development (unstable) (2009-09-24 r49817)
x86_64-unknown-linux-gnu

locale:
  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] RSQLite_0.7-2 DBI_0.2-4

loaded via a namespace (and not attached):
[1] tools_2.10.0

> 
>> Just my 2 c.
> 
> 
>> Cheers,
>> H.
>>
>>>
>>> 2009/9/29 Hervé Pagès <hpages using fhcrc.org>:
>>>> Hi,
>>>>
>>>> Consider the following data:
>>>>
>>>>  mydata <- data.frame(start=1:5, end=11:15)
>>>>
>>>> Then storing and retrieving the data shows that the 'end'
>>>> column was renamed:
>>>>
>>>>  drv <- dbDriver("SQLite")
>>>>  con <- dbConnect(drv, dbname=":memory:")
>>>>
>>>>  > dbWriteTable(con, "mydata", mydata)
>>>>  [1] TRUE
>>>>  > dbReadTable(con, "mydata")
>>>>    start end__1
>>>>  1     1     11
>>>>  2     2     12
>>>>  3     3     13
>>>>  4     4     14
>>>>  5     5     15
>>>>
>>>> The culprit doesn't seem to be dbReadTable():
>>>>
>>>>  > dbGetQuery(con, "SELECT * FROM mydata")
>>>>    row_names start end__1
>>>>  1         1     1     11
>>>>  2         2     2     12
>>>>  3         3     3     13
>>>>  4         4     4     14
>>>>  5         5     5     15
>>>>
>>>> and using the sqlite3 client from outside R confirms this.
>>>>
>>>> Looks like a bug in dbWriteTable().
>>>>
>>>> Cheers,
>>>> H.
>>>>
>>>> -- 
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M2-B876
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages using fhcrc.org
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>>>
>>>> _______________________________________________
>>>> R-sig-DB mailing list -- R Special Interest Group
>>>> R-sig-DB using stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>>>>
>>
>> -- 
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages using fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>> _______________________________________________
>> R-sig-DB mailing list -- R Special Interest Group
>> R-sig-DB using stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>>
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319




More information about the R-sig-DB mailing list