[R] Workaround for RODBC asymmetric numeric data treatment
Andrew
nodecorum at yahoo.com
Fri Oct 3 19:39:31 CEST 2014
Note: I did raise report the issue below to r-sig-db at r-project.org, but didn't see any reply.
I'm hoping somebody on r-help can help me devise a workaround for a problem I'm having
with RODB:
I use RODBC to read and write a good deal of data to SQL Server and I'd be extremely grateful
if anyone has found a workaround in order to be able to write dataframes to SQL Server
using RODBC dynamically created SQL tables and read the data from those tables, or indeed any
arbitrary SQL Server table with "float" datatypes and end up with numeric columns instead of "factor" columns
in a dataframe in R.
I have found that when RODBC creates a Microsoft SQL Server data table from a dataFrame using sqlSave(....append=FALSE),
RODBC uses the SQL "float" datatype to store R numeric data in a dynamically-created table on the server.
However, when RODBC reads any SQL Server "float" datatype from SQL Server via sqlQuery it interprets float columns as "factor" data.
I created a standalone sample below to demonstrate the odd behavior of RODBC that I hope to overcome:
# Assuming the reader has access to SQL Server the code below is self-contained and repeatable
# I believe it demonstrates unexpected and undesirable behavior in RODBC
library(RODBC)
library(fPortfolio)
library(timeSeries)
head(SWX.RET$SBI)
str(SWX.RET$SBI)
mydata<-as.timeSeries(SWX.RET)
head(mydata)
df2beSavedByRODBC =as.data.frame(mydata)
str(df2beSavedByRODBC)
# shows the numeric data in the dataframe
#
# data.frame': 1916 obs. of 6 variables:
# $ SBI : num -0.002088 -0.000105 -0.00136 0.000419 0 ...
# $ SPI : num -0.03439 -0.01041 0.01212 0.02246 0.00211 ...
...
# Let's save the dataframe to SQL Server:
dbconn<-odbcDriverConnect(connection="Driver={SQL Server};server=_YOURSERVERNAMEHER_;database=_YOURDBNAME_;Trusted_Connection=True;");
sqlSave(channel=dbconn,dat=df2beSavedByRODBC,tablename="testTable",rownames=TRUE,append=FALSE,addPK=FALSE,verbose=FALSE)
# The sqlSave above works very well. The new table is create in the Microsoft SQL database and the ddl for the table is:
#
# [dbo].[testTable](
# [rownames] [varchar](255) NULL,
# [SBI] [float] NULL,
# [SPI] [float] NULL,
# [SII] [float] NULL,
# [LP25] [float] NULL,
# [LP40] [float] NULL,
# [LP60] [float] NULL
# )
# The numeric values from the dataframe are stored as float (i.e. numeric) in SQL server -- good!
## now let's read back the data RODBC stored in SQL server from a SQL table RODBC created:
sqlString = "select * from testTable"
dataFrameFromDB = sqlQuery(dbconn,sqlString,errors=TRUE);
str(dataFrameFromDB)
#
# 'data.frame': 1916 obs. of 7 variables:
# $ rownames: Factor w/ 1916 levels "2000-01-04","2000-01-05",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ SBI : Factor w/ 1742 levels "-0.00041080415489958",..: 349 42 161 1418 828 48 49 1419 1024 135 ...
# $ SPI : Factor w/ 1848 levels "-0.0020169904194276",..: 445 48 970 883 1187 377 1157 1065 951 1840 ...
...
#********* RODBC wrote numeric data to SQL Server as float, but read the same data back as Factor ! ********
I could use some help to create a robust and flexible workaround for RODBC's asymmetric treatment of numeric data.
If there were some way to force RODBC sqlQuery to interpret all SQL Server float datatypes as numeric my problem would be solved.
FWIW: RODBC does interpret the SQL Server "real" datatype as numeric.
Thank you,
Andrew
[[alternative HTML version deleted]]
More information about the R-help
mailing list