[R] tips to speed up sqlSave with MS SQL Server?

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Apr 28 21:28:28 CEST 2008


I think the short answer is that RODBC is not designed for that, because 
ODBC is not.  There seems to be an ODBC extension specific to SQL Server 
to do so (somewhere said 'SQL Server version 7.0 or later', which may not 
apply to you).

I'm pretty unlikely to add support for just one database, especially one 
that requires files from SQL Server.  Also, I don't know of any RODBC /SQL 
Server users who might be motivated to do so.

There is work in progress to implement SQLBulkOperations, but that is a 
different concept (and not yet wired up to sqlSave).

On Mon, 28 Apr 2008, Thomas Pujol wrote:

> I am using R2.6.0 on ?Windows Small Business Server 2003?.  I apologize 
> if the answer to my question is available?I have searched but have not 
> found anything that I thought helped me.
>
>  I have a dataframe that contains ~4.5 million rows and 5 columns. 
> (see memory and df details below).  I am trying to save the dataframe to 
> a MS SQL Server database, using the ?sqlSave? function.  The code below 
> seems to work, but takes several hours.
>
>  ?sqlSave(channel, dat=idxdata, varTypes=c(ddates="datetime") )?
>
>  Any tips how I can speed things up? Or is my dataframe so large that it 
> is going to take a while? (I have ~20 dataframes that I need to save to 
> SQL, so speed is somewhat important.)  Is there an altogether different 
> approach I should consider taking?

Use a different client that does implement bulk copy operations?  At least 
SQL Server 2005 comes with a bcp.exe command-line client to do this.  See
http://msdn2.microsoft.com/en-us/library/ms188728.aspx


>  FYI, here is information re: the dataframe and memory on my system. 
> Please let me know if there is any further information I should provide.
>
>  > memory.size(max = F) #reports amount of memory currently in use
>  [1] 131.8365
>
>  > str(idxdata)
>  'data.frame':   4474553 obs. of  5 variables:
>   $ idkey   : int  1003 1003 1003 1003 1003 1003 1003 1003 1003 1003 ...
>   $ nnd     : Factor w/ 25 levels "01","01C","02",..: 1 1 1 1 1 1 1 1 1 1 ...
>   $ curcdd  : Factor w/ 2 levels "CAD","USD": 2 2 2 2 2 2 2 2 2 2 ...
>   $ ddates:Class 'Date'  num [1:4474553] 6942 6943 6944 6945 6948 ...
>   $ idx     : num  100 100 100 100 100 100 100 100 100 100 ...
>
>  > object.size(idxdata)
>  [1] 125289688

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list