[R-sig-DB] SQL escaping/quoting proposal

NISHIYAMA Tomoaki tomo@k|n @end|ng |rom @t@||@k@n@z@w@-u@@c@jp
Sat Oct 19 06:00:19 CEST 2013


Hi Hadley,

Thanks to bring up this issue and 
I agree in the overall direction and have some minor concerns.

Could you provide a bit more clarification on what the sqlQuoteIdentifier should do?
How shall we deal when a vector of strings is passed?
  Is it right to assume that sqlQuoteIdentifier(drv, c('a', 'b')) should 
  return a vector consisting of quoted results of individual element of the vector?

How do we construct a reference to table with schema, or column with table?
eg schema.table or table.column?
  More specifically, is it right to assume that sqlQuoteIdentifier is used for constructing
  individual part of the composite identifier?

You had a minor mistake in showing the default method (The name is both "sqlQuoteString",
and I am not sure which is intended for sqlQuoteIdentifier).

Another consideration is for the name of the function. Whether we should 
use sql prefix or use db prefix.  I would like to know what others think for this point.

> This not only avoids any problems with SQL reserved words, but
> it also ensures that every field name in R (even those containing
> spaces and other special characters) can be used in SQL.

Here, I think we can avoid problem in most cases, but there are still a bit
cases where the encoding does not allow proper conversion.
That's the problem of the database capability, there is not much things that
the driver can do, though.
-- 
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi, 
Kanazawa, 920-0934, Japan


On 2013/10/19, at 1:44, Hadley Wickham wrote:

> Hi all,
> 
> The approach that DBI takes to escaping is sub-optimal: it tries to
> figure out if an R variable name matches an SQL reserved word, and if
> so munge it so that there's no longer a conflict. This creates a
> situation where there are some identifiers that are valid in R, and
> some that are valid in SQL and we have a complicated and bug prone
> approach to converting between them.
> 
> Instead, I recommend taking an approach where identifiers (i.e. table
> and field names) are always quoted using the appropriate database
> syntax. This not only avoids any problems with SQL reserved words, but
> it also ensures that every field name in R (even those containing
> spaces and other special characters) can be used in SQL.
> 
> To achieve this change, I think we should to:
> 
> * deprecate `make.db.names()`, `isSQLKeyword()`, and `SQLKeywords()`
> * add new generics `sqlQuoteString()` and `sqlQuoteIdentifier()`.
> 
> The new generics would be defined on the driver object, and would come
> with default methods as follows:
> 
> ```
> setGeneric("sqlQuoteString", function(drv, x, ...) {
>  standardGeneric("sqlQuoteString")
> })
> setMethod("sqlQuoteString", "DBIDriver", function(drv, x, ...) {
>  x <- gsub('"', '""', x, fixed = TRUE)
>  paste('"', x, '"', sep = "")
> })
> 
> setGeneric("sqlQuoteIdentifer", function(drv, x, ...) {
>  standardGeneric("sqlQuoteIdentifer")
> })
> setMethod("sqlQuoteString", "DBIDriver", function(drv, x, ...) {
>  x <- gsub("'", "''", x, fixed = TRUE)
>  paste("'", x, "'", sep = "")
> })
> ```
> 
> Individual implementations would be encouraged to provide methods that
> use the quoting functions provided by the client library, where
> available.
> 
> Does anyone see any problems with this approach?
> 
> Hadley
> 
> 
> -- 
> Chief Scientist, RStudio
> http://had.co.nz/
> 
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-db




More information about the R-sig-DB mailing list