[R-sig-DB] Long query time running R + RODBC against SQL Server

Doug Hill DH||| @end|ng |rom h@g|n|m@com
Mon Oct 20 18:16:01 CEST 2008


Hi, all. I'm using RODBC to query a SQL Server 2005 database, and am experiencing very slow performance. I assume I'm doing something wrong, but I'm not doing anything fancy, so I'm stumped. 

My setup: 

Core 2 Duo 2GHz, Win XP, 2GB RAM 
R 2.7.1 
RODBC (latest from CRAN, not sure the version - how to tell?) 
SQL Server 2005 
MS Access 2003 
SQL Native Client ODBC driver (DSN = "HIM") 
SQL Server ODBC driver (DSN = "HIMs") 
MS Access ODBC driver (DSN = "HIMa") 

Table being queried has ~290K rows, of which the query being run should return ~78K. 

Table being queried has 142 columns, of which all but 5 are type float. The other 5 are nvarchar(255) .

No indices, keys, or joins - one flat table 

No UPDATEs, INSERTs, just a SELECT 

R code looks like this: 

him <- odbcConnect("HIM", "CDCS", "*****") 
FAC <- sqlQuery(him, "Select * from T304 where f133=1") 

If I use either SQL Server DSN, the query seems to hang, though I can see in Task Manager that Rgui.exe grows up to point, and uses most of one CPU. I killed R after waiting 13+ minutes. 
 
If I use sqlFetch() instead and specify max records as 1000, this does return but only after ~2 minutes.
 
I have tried both these calls with believeNRows=FALSE, with the same result.

In SQL Server itself, the same query completes after ~10 seconds. 

If I use the Access DSN in my R code above, the query takes 12-15s. It's the same data in Access, in fact it was imported into SQL Server 2005 from the Access DB. 

Any ideas how a query against SQL Server could be that much slower than against Access? I understand there are conversions performed for the 5 non-float columns, but I believe that would be true in the Access case as well. 
 
Any troubleshooting ideas?  

Thanks!

Doug 

Internal Virus Database is out-of-date.
Checked by AVG. 

7:50 AM
 

	[[alternative HTML version deleted]]




More information about the R-sig-DB mailing list