[R-sig-DB] Basic R to SQL considerations

MacQueen, Don m@cqueen1 @end|ng |rom ||n|@gov
Wed Feb 26 17:19:24 CET 2014


I have a couple of things to add.

For the R-to-Oracle connection, the ROracle package appears to now be
supported by Oracle, which to me suggests some confidence in ongoing
future support. For more information, see
https://blogs.oracle.com/R/entry/r_to_oracle_database_connectivity

Another option is JDBC
(http://en.wikipedia.org/wiki/Java_Database_Connectivity), also mentioned
in the Oracle blog cited above. I have had success connecting to Oracle
using RJDBC from a Mac when other options were not available to me.
Quoting from the Oracle blog cited above, "Any database that supports a
JDBC driver can be used in connection with RJDBC." CRAN has both Mac and
Windows binaries for RJDBC.

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 2/24/14 7:45 PM, "Paul Gilbert" <pgilbert902 using gmail.com> wrote:

>(This is prompted by an R-help  post entitled '[R] Problem connecting to
>database via RPostgreSQL/RS-DBI: "could not connect" error' but this
>post is not specific to RPostgreSQL)
>
>The general questions in the above R-help post reminded me of problems I
>had when starting to use SQL databases in conjunction with R. Below I
>try to provide some general comments and guidance with respect to
>options for using SQL databases from R. The main purpose of this post is
>so than anyone just starting to use R with SQL may find something to
>help with some of the initial decisions they must make. I hope that
>others will respond by posting additional detail where that seems
>warranted. My knowledge in some parts is limited.
>
>
>In general SQL databases operate on a client/server model. R is a client
>program and will use the client interface of the database. The client
>interface drivers typically must be installed separately and prior to
>installing the R package that uses the client interface. Most client
>interfaces also include a standalone client (e.g. sqlite, mysql, psql).
>Among other things, this is useful for ensuring your client machine can
>connect to the database you intend to use, as it does not rely on R or
>any R packages. Before concluding that the R package does not work, you
>should ensure that the stand alone package does work.
>
>You will also need a database to talk to, which means (excepting SQLite)
>a computer with the server software installed and set up, a database
>defined, and appropriate privileges for your client machine and userid.
>Often an R user will not need to do this because the server and database
>is provided by someone else (e.g. a "corporate" database within an
>organization.) When the database is provided, the choice of which R
>client package to use will be dictated by that. When the R user must
>also set up the database, then the main hurdle is likely to be the
>server side rather than the client side. (The user may need to learn a
>fair amount about system administration to install the server, and then
>a certain amount about database administration to set up the database
>with appropriate privileges.)
>
>It is not unusual for the server to be a Unix or Linux machine and the
>client to have a different operating system, like Windows or Mac. In
>fact, this may be the most common situation. The SQL interfaces handle
>the cross-platform issues between server and client.
>
>The notion of a "totally local" database fits in the SQL context in a
>way that will be unfamiliar to users of standalone programs (but see
>further below about SQLite). Even for a local database, the server must
>still be running, in this case on the same machine as the client, and
>must be listening for requests from client applications. That means it
>is configured to listen on a local socket, the loopback interface, or
>the IP address of the local machine, and recognizes the userid/password
>or some other security mechanism. This server can still respond to
>multiple client applications running simultaneously, for example, two R
>sessions, and to multiple users on the machine. So this is not "totally
>local" in the sense that many people may think of that term.
>
>A message like "could not connect" or "Can't connect" in a client
>application indicates that the client cannot talk to the server. It is
>unlikely that you will get this far if the client software is not
>installed properly. The problem is that the server is not running, not
>connected to a network the client can reach, blocked by a firewall, not
>configured to listen for the client on the interface that the client is
>trying to use, or the client username/password or other authentication
>mechanism is not recognized by the server. The last problem can be
>either that the client is not using the proper credentials or the server
>is not set up properly to recognize the credentials.
>
>You should get the same "could not connect"  message if you run the
>standalone client on the same machine. If you don't, then you are using
>different credentials with the standalone client than you are from the R
>session and that is the problem you need to resolve. You will probably
>need to talk with your system administrator or database administrator if
>you are trying to connect to a "corporate" database. In this case it is
>usually best to report the errors as produced by the standalone client.
>Otherwise you are likely to confuse the administrator into thinking the
>problem is in R. (And in many cases that will mean they then consider it
>is your problem rather than their problem.)
>
>It is sometimes possible to work around a server setup that does not
>allow remote connections by setting up an ssh tunnel, so the connection
>appears to the server as if it is local. If the connection is over the
>Internet this might also add some additional security in the transmission.
>
>
>  ***** Relative strengths of different options *****
>
>As mentioned above, if you are trying to connect to a database provided
>by someone else then your choice of which client to use is already
>decided, and you do not need to consider this section.
>
>SQLite (www.sqlite.org/) is by far the easiest to set up. Installation
>is almost automatic.  R package Windows and Mac binaries are available
>from CRAN. If you are thinking "totally local" then this is very likely
>the option to choose. It is "serverless". The database is in a file
>which could be read by more than one session simultaneously, but writing
>by multiple sessions will cause difficulties. Permissions are controlled
>by the file permissions (I think). A good discussion of whether SQLite
>is appropriate or not is provided at www.sqlite.org/whentouse.html. To
>compile, the source package compiling tools need to be installed before
>the R package is installed. The R package is RSQLite.
>
>MySQL (www.mysql.com/) requires a server, which is installed and set up
>separately from the client. An R package Mac binary is available from
>CRAN but the Windows binaries are no longer supported (see
>cran.at.r-project.org/bin/windows/contrib/r-release/ReadMe). The client
>interface library binaries need to be installed. To compile, the source
>package compiling tools need to be installed before the R package is
>installed. The R package is RMySQL.
>
>PostgreSQL (postgresql.org/) requires a server, which is installed and
>set up separately from the client.  R package Windows and Mac binaries
>are available from CRAN. The client interface library binaries need to
>be installed. To compile, the source package compiling tools need to be
>installed before the R package is installed. The R package is RPostgreSQL.
>
>Oracle (www.oracle.com) is at the "heavy duty" end of the spectrum. It
>seems unlikely that one would choose this without having a specialized
>database administrator to install and set up the database and set user
>credentials.  R package Windows and Mac binaries are not available from
>CRAN (see cran.at.r-project.org/web/packages/ROracle/index.html). The
>client interface library binaries need to be installed and, to compile,
>the source package compiling tools need to be installed before the R
>package is installed. The R package is ROracle.
>
>The choice of SQLite or Oracle may be relatively straightforward, but
>between MySQL and PostgreSQL the choice is more difficult. There has
>been considerable convergence and, apparently, many of the historical
>differences no longer exist. One comparison is available at
>www.wikivs.com/wiki/MySQL_vs_PostgreSQL, and there are undoubtedly
>others. In the end, the choice might come down to options for purchasing
>support, or a preferred philosophy about the development model. The R
>packages are both mature and well supported (by volunteers). Your choice
>might be influenced more by the server side considerations than by R
>client considerations. Both have been more than adequate for my
>purposes. One difference I found, when loading csv data directly on the
>server is that MySQL allowed more incorrectly formatted data (e.g.
>dates). That is, it made more guesses, sometimes correctly and sometimes
>not. The good guesses may be consider a feature, but PostgreSQL may be a
>better option if you consider the loading of csv data to be part of the
>data cleaning process. This will usually not be a consideration if you
>are loading the data from R. (For what it is worth, my recommendation is
>to use both and pay attention to writing standard SQL so you can switch
>back and forth easily - but I know that people do not always follow my
>recommendations.)
>
>Another option is ODBC (see http://en.wikipedia.org/wiki/ODBC). ODBC
>provides a middleware API which can be useful for further standardizing
>your interface to the database. The main reason for using this is likely
>to be that it is the interface supported by a database you need to
>access. The server and client will need ODBC layers in place of, or in
>addition to, their usual interfaces. ODBC has its own configuration. If
>you are setting up the database yourself this is likely to be more
>rather than less complicated. You still need to install and set up the
>server ( MySQL, PostgreSQL, or Oracle). The R package is RODBC.
>
>
>Installation and setup are beyond the scope of this post. Perhaps
>someone else can attempt that or point to instructions elsewhere. Just
>beware that you need to distinguish installation and setup of the client
>from installation and setup of the server, and you may need both. There
>are probably other lists with better discussions of the server side,
>since it really is not specific to R.
>
>Paul
>
>_______________________________________________
>R-sig-DB mailing list -- R Special Interest Group
>R-sig-DB using r-project.org
>https://stat.ethz.ch/mailman/listinfo/r-sig-db




More information about the R-sig-DB mailing list