[R-sig-DB] SQL escaping/quoting proposal

Lee Hachadoorian Lee@H@ch@door|@n+L @end|ng |rom gm@||@com
Fri Oct 18 19:26:10 CEST 2013


On 10/18/2013 12:44 PM, Hadley Wickham wrote:
> Hi all,
>
> The approach that DBI takes to escaping is sub-optimal: it tries to
> figure out if an R variable name matches an SQL reserved word, and if
> so munge it so that there's no longer a conflict. This creates a
> situation where there are some identifiers that are valid in R, and
> some that are valid in SQL and we have a complicated and bug prone
> approach to converting between them.
>
> Instead, I recommend taking an approach where identifiers (i.e. table
> and field names) are always quoted using the appropriate database
> syntax. This not only avoids any problems with SQL reserved words, but
> it also ensures that every field name in R (even those containing
> spaces and other special characters) can be used in SQL.
>
> To achieve this change, I think we should to:
>
> * deprecate `make.db.names()`, `isSQLKeyword()`, and `SQLKeywords()`
> * add new generics `sqlQuoteString()` and `sqlQuoteIdentifier()`.
>
> The new generics would be defined on the driver object, and would come
> with default methods as follows:
>
> ```
> setGeneric("sqlQuoteString", function(drv, x, ...) {
>    standardGeneric("sqlQuoteString")
> })
> setMethod("sqlQuoteString", "DBIDriver", function(drv, x, ...) {
>    x <- gsub('"', '""', x, fixed = TRUE)
>    paste('"', x, '"', sep = "")
> })
>
> setGeneric("sqlQuoteIdentifer", function(drv, x, ...) {
>    standardGeneric("sqlQuoteIdentifer")
> })
> setMethod("sqlQuoteString", "DBIDriver", function(drv, x, ...) {
>    x <- gsub("'", "''", x, fixed = TRUE)
>    paste("'", x, "'", sep = "")
> })
> ```
>
> Individual implementations would be encouraged to provide methods that
> use the quoting functions provided by the client library, where
> available.
>
> Does anyone see any problems with this approach?
>
> Hadley
>
>

Hadley,

Admittedly, I rarely use R to *create* SQL table definitions. But I 
would like to preserve the possibility of (a) creating db-safe 
identifiers, (b) not using SQL quoting when I don't want to.

Regarding (a), I would suggest *not* deprecating `make.db.names()`, etc.

Regarding (b), my reasoning is that when working with Postgres, I would 
want to allow Postgres to do its normal lower casing of unquoted 
identifiers. That is, `thisField` is a valid identifier, but passed in 
quoted case will be preserved, which means always having to quote the 
identifier in the future. Passed in quoted, Postgres will force it to 
`thisfield`, and if a client requests `thisField` unquoted, the correct 
field will be returned.

--Lee

-- 
Lee Hachadoorian
Assistant Professor in Geography, Dartmouth College
http://freecity.commons.gc.cuny.edu




More information about the R-sig-DB mailing list