[R-sig-DB] Long query time running R + RODBC against SQL Server
Doug Hill
DH||| @end|ng |rom h@g|n|m@com
Mon Oct 20 18:16:01 CEST 2008
Hi, all. I'm using RODBC to query a SQL Server 2005 database, and am experiencing very slow performance. I assume I'm doing something wrong, but I'm not doing anything fancy, so I'm stumped.
My setup:
Core 2 Duo 2GHz, Win XP, 2GB RAM
R 2.7.1
RODBC (latest from CRAN, not sure the version - how to tell?)
SQL Server 2005
MS Access 2003
SQL Native Client ODBC driver (DSN = "HIM")
SQL Server ODBC driver (DSN = "HIMs")
MS Access ODBC driver (DSN = "HIMa")
Table being queried has ~290K rows, of which the query being run should return ~78K.
Table being queried has 142 columns, of which all but 5 are type float. The other 5 are nvarchar(255) .
No indices, keys, or joins - one flat table
No UPDATEs, INSERTs, just a SELECT
R code looks like this:
him <- odbcConnect("HIM", "CDCS", "*****")
FAC <- sqlQuery(him, "Select * from T304 where f133=1")
If I use either SQL Server DSN, the query seems to hang, though I can see in Task Manager that Rgui.exe grows up to point, and uses most of one CPU. I killed R after waiting 13+ minutes.
If I use sqlFetch() instead and specify max records as 1000, this does return but only after ~2 minutes.
I have tried both these calls with believeNRows=FALSE, with the same result.
In SQL Server itself, the same query completes after ~10 seconds.
If I use the Access DSN in my R code above, the query takes 12-15s. It's the same data in Access, in fact it was imported into SQL Server 2005 from the Access DB.
Any ideas how a query against SQL Server could be that much slower than against Access? I understand there are conversions performed for the 5 non-float columns, but I believe that would be true in the Access case as well.
Any troubleshooting ideas?
Thanks!
Doug
Internal Virus Database is out-of-date.
Checked by AVG.
7:50 AM
[[alternative HTML version deleted]]
More information about the R-sig-DB
mailing list