[R] SQL Server "float" is not handled by RODBC -- Is there a workaround?

Mark Dalphin mdalphin at gmail.com
Fri Oct 16 00:02:24 CEST 2015


Hi Jim,

Yes, your Java versions need to match bit width.

As for drivers, I do not use the Microsoft one; it used to be hard to
obtain for Linux users; I don't know if it still is. I currently use
"jtds" from Sourceforge:
    http://jtds.sourceforge.net/

From their site:
> jTDS is an open source 100% pure Java (type 4) JDBC 3.0 driver for
> Microsoft SQL Server (6.5, 7, 2000, 2005, 2008 and 2012) and Sybase
> Adaptive Server Enterprise (10, 11, 12 and 15).

I always found reading the DBI + DBI-specific driver (eg RPostgreSQL or
RJDBC) documentation challenging. I think it is oriented towards the
programmer who wrote it rather than the end user. It probably needs a
long, wordy vignette like Tim Bunce once wrote for the Perl DBI, but I
am not stepping up to write it (so you don't hear me complaining)! Over
the years, I've gleaned enough and experimented enough that I have
certain methods well worked out. And I have ignored much of the
functionality that has been added over the years; I am sure that the DBI
implementors have put some real gems in there as I stumble upon new ones
all the time.

The help for DBI, "help(package='DBI')", lists many methods. RJDBC is
just an intermediary and it happens to implement most of them. In fact,
I usually read the DBI drivers packages (RPosgreSQL, RJDBC) with an eye
towards find "what is not implemented or what is broken" more than
anything else.

DBIConnection-class                    DBIConnection class.
DBIDriver-class                        DBIDriver class.
DBIObject-class                        DBIObject class.
DBIResult-class                        DBIResult class.
SQL                                    SQL quoting.
dbClearResult                          Clear a result set.
dbColumnInfo                           Information about result types.
dbConnect                              Create a connection to a DBMS.
dbDataType                             Determine the SQL data type of an
object.
dbDisconnect                           Disconnect (close) a connection
dbDriver                               Load and unload database drivers.
dbExistsTable                          Does a table exist?
dbFetch                                Fetch records from a previously
executed query.
dbGetException                         Get DBMS exceptions.
dbGetInfo                              Get DBMS metadata.
dbGetQuery                             Send query, retrieve results and
then clear result set.
dbGetRowCount                          The number of rows fetched so far.
dbGetRowsAffected                      The number of rows affected by
data modifying query.
dbGetStatement                         Get the statement associated with
a result set
dbHasCompleted                         Has the operation completed?
dbIsValid                              Is this DBMS object still valid?
dbListConnections                      List currently open connections.
dbListFields                           List field names of a remote table.
dbListResults                          A list of all pending results.
dbListTables                           List remote tables.
dbReadTable                            Copy data frames to and from
database tables.
dbRemoveTable                          Remove a table from the database.
dbSendQuery                            Execute a statement on a given
database connection.
dbiCheckCompliance                     Check a driver for compliance
with DBI.
make.db.names                          Make R identifiers into legal SQL
identifiers.
transactions                           Begin/commit/rollback SQL
transactions

In short, I usually only use a small subset of what is available:
    dbConnect, dbDisconnect, dbGetQuery, dbSendQuery, dbFetch and
'transactions'

Some of the apparent richness is from things like two methods to get
table information, one detailed and one simple. And since I know my
schemas already, I seldom use either. In addition, there is ample
support for complex DB interaction, including full support for
transactions, allowing multi-table updates with security.

My usual operation is something like this:

library(RJDBC)

jdbcStr <- 'jdbc:jtds:sqlserver://myHost.myLan:myPort/myDbName'

drv <- JDBC("net.sourceforge.jtds.jdbc.Driver",
            "/my/long/path/to/my/JDBC_Drivers/jtds/jtds-1.3.0.jar")

conn <- dbConnect(drv, jdbcStr, user='mark', password='*****')

## See what tables are present (DB exploration; not used in production)
tab <- dbGetTables(conn)

## A typical query if I am to iterate through
rs <- dbSendQuery(conn, 'SELECT TOP 10 * FROM Orders;')
## followed by dbFetch calls

## A more typical query if I can grab the whole thing at once
rs <- dbGetQuery(conn, 'SELECT * FROM Orders;')
print(rs)

dbDisconnect(conn)

Hope this helps; I might be able to help with specific examples for some
of the DBI methods, if you need it. I suspect that what I have written
above will cover most "query SQL", though not the "Insert/Update/Delete"
stuff.

Regards,
Mark Dalphin


On 16/10/15 10:01, jim holtman wrote:
> Mark,
>
> Thanks for the suggestion.  I will have to look into that option.  I
> assume that if I am running on a 64-bit system, I also have to use the
> 64-bit version of Java.  We have had some problems in the past because
> the company standard is a 32-bit version of Java and we had to also
> load in the 64-bit version to work with the XLConnect package.
>
> I was reading the RJDBC package documention and they seem to list a
> lot of methods (e.g., 'dbReadTable'), but don't say what the
> parameters are, or what it returns.  Where do you find this
> information?  Also I notice that I probably have to get the JDBC
> driver for SQL Server from Microsoft and install that - is that correct?
>
> I hate to start mixing approaches since we have a large number of
> scripts that currently use the RODBC package, but I will try to see if
> the approach you proposed do help overcome this problem.
>
>
> Jim Holtman
> Data Munger Guru
>  
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Thu, Oct 15, 2015 at 4:18 PM, Mark Dalphin <mdalphin at gmail.com
> <mailto:mdalphin at gmail.com>> wrote:
>
>     Hi Jim,
>
>     No answers over the course of 24 hours so I'll give it a shot.
>
>     First, I always work under Linux, so my answers may well be worthless
>     for your Windows scenario.
>
>     Second, I don't know if my workaround works as I don't actually have a
>     SQL Server DB using float.
>
>     Now the workaround:
>
>     I have had many problems in the past using ODBC to connect to
>     databases.
>     Nothing I could nail down to a fault in that system, but just no
>     end of
>     problems. Some of that, of course, is due to me generally working
>     under
>     Linux.
>
>     My general workaround that has been clean is to use JDBC instead.
>     There
>     have been hassles at times to set up the RJava, but recent versions of
>     that have installed very easily. Once RJava is in place (and under
>     Windows, you'll have fun setting up Java cleanly), then
>     installation of
>     a JDBC jar (I use jtds from SourceForge for SQL Server) and finally
>     RJDBC. The generic nature of the JDBC interface is a joy to work with,
>     interacting with most database types very well and in a uniform
>     manner.
>
>     So, lots of work getting JDBC up and going to see if an
>     alternative path
>     into your DB gets you your data in a better format. Now you see why I
>     waited 24 hours to say anything at all ...
>
>     Also, it might be worth while posting on the DB specific maillist:
>         https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
>     Hope this helps,
>     Mark Dalphin
>
>
>     On 15/10/15 07:23, jim holtman wrote:
>     > Here is the system I am using:
>     > =====================================
>     >> sessionInfo()
>     > R version 3.2.2 (2015-08-14)
>     > Platform: x86_64-w64-mingw32/x64 (64-bit)
>     > Running under: Windows 7 x64 (build 7601) Service Pack 1
>     >
>     > locale:
>     > [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>     > States.1252
>     > [3] LC_MONETARY=English_United States.1252
>     > LC_NUMERIC=C
>     > [5] LC_TIME=English_United States.1252
>     >
>     > attached base packages:
>     > [1] stats     graphics  grDevices utils     datasets  methods   base
>     >
>     > other attached packages:
>     > [1] lubridate_1.3.3 RODBC_1.3-12
>     > loaded via a namespace (and not attached):
>     > [1] magrittr_1.5  plyr_1.8.3    tools_3.2.2   memoise_0.2.1
>     Rcpp_0.12.1
>     > stringi_0.5-5 digest_0.6.8
>     > [8] stringr_1.0.0
>     > ========================================
>     >
>     > I have data on a SQL Server that I am connecting to where some
>     of the
>     > fields are defined as "float" so that the data is stored in the
>     database as
>     > an IEEE 754 value.  Now when I read this is using RODBC, the
>     data comes
>     > across the interface in the floating point format; I used
>     Wireshark to
>     > examine the packets that were being sent.  Some of the data is
>     also defined
>     > as "int" and comes across in binary.
>     >
>     > When the data is read in with
>     >
>     > df <- sqlQuery(db, "select * from mydb", as.is <http://as.is> =
>     TRUE)
>     >
>     > The resulting dataframe has the floating point values as 'chr'
>     and the
>     > integer fields as 'int'; I would have expected the floating
>     point fields to
>     > be 'num'.  Now in the "ODBC Connectivity" Vignette by Ripley
>     there was the
>     > comment that "double" data values come back as type 8, but on
>     some systems
>     > they may be type 6; well on SQL Server, "float" is type 6.
>     >
>     > So what appears to happen, is this data is not recognized as a
>     floating
>     > point value and is therefore converted to a character.  When the
>     data is
>     > made available to the R script, I then have to convert this back to
>     > floating point.  If I use "stringsAsFactors = FALSE" on the
>     query, this
>     > conversion back to floating point will be done within the RODBC
>     package.
>     > This becomes a problem when I have dataframes with several
>     million rows and
>     > multiple columns of numerics is that the conversion to/from
>     characters is
>     > adding time to the processing.
>     >
>     > So I was wondering is there a workaround to this problem?  Is it
>     possible
>     > to add the capability to RODBC when processing SQL Server to
>     avoid this
>     > conversion?  Or is there some other way around this problem?
>     >
>     > Jim Holtman
>     > Data Munger Guru
>     >
>     > What is the problem that you are trying to solve?
>     > Tell me what you want to do, not how you want to do it.
>     >
>     >       [[alternative HTML version deleted]]
>     >
>     > ______________________________________________
>     > R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>     >
>
>     ______________________________________________
>     R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>
>


	[[alternative HTML version deleted]]



More information about the R-help mailing list