[R-sig-DB] SQL generics

Wed Oct 23 01:13:53 CEST 2013

> Because of the sql prefix my understanding is that the generics below
> (sqlCreateTable(), etc...) only generate strings containing valid SQL
> but that they actually don't talk to the db. I understand that you
> need the 'conn' argument for dispatch but it might be a little bit
> confusing.

Right.  Dispatching on conn would be a bit confusing for users, but I
think these would mostly be used by package authors.

> Are you also considering to provide dbCreateTable(), dbDropTable(),
> dbInsertInto(), dbUpdate(), and dbDeleteFrom() generics?

No, I don't think so - the layer of commands built on top of the sql*
functions would be more like dbWriteTable - i.e. they'd map to
operations that make more sense from the R side of things, rather than
being a straightforward translation to things that make sense to
databases.

> Do we really need the 2 sets? If not, which one is most useful?
> Alternatively the 2nd set could have a 'return.SQL.only' arg (FALSE
> by default) so the user/developer has a way to know what SQL
> is sent to the db (can be useful for learning/debugging, or for
> "retouching" the query before sending).

I'm not a big fan of functions that return different types based on
their input arguments because they're harder to program against.
However, you could imagine these functions returning some sort of
reference class from which you could either extract the sql, or you
could run the query once. But that seems a bit too complicated for
DBI.

> If it was possible to use only high-level back-end agnostic commands
> like this in our R code when talking to a db, then it would be easy
> to support multiple back ends. Sounds like a noble goal in theory but
> it might be hard to achieve. As you noticed the grammar of the SELECT
> statement is complicated and complex statements can require a lot of
> back-end specific tweaking in order to perform efficiently.

Yes, I've spent quite a lot of time on this for the dplyr package,
which provides a syntax that allows you to work with data frames, data
tables and databases (sqlite, mysql, postgresql and google big query)
without having to worry too much about the how the data is stored. But
the only way to do this is to carefully constrain the problem domain,
and I think there's too much variation in SQL grammars for SQL to
serve as the abstraction layer (not without a huge amount of work
anyway).

Some of the methods might only actually be useful transiently while
you're developing a new DBI package (i.e. once you've implemented
sqlCreateTable and sqlInsertInto you get an inefficient dbWriteTable
for free), or for testing. But I don't think that's necessarily a
problem.

Hadley

-- 
Chief Scientist, RStudio
http://had.co.nz/