[R] tips to speed up sqlSave with MS SQL Server?

jim holtman jholtman at gmail.com
Tue Apr 29 03:14:17 CEST 2008


?save

This will write the object to a file in the fastest manner.  Here is an example:

> x <- runif(125000000/8)
> object.size(x)
[1] 125000024
> system.time(save(x, file='/tempxx.Rdata'))
   user  system elapsed
  56.84    0.86   87.97
>

This wound up to be 79MB on disk after compression.

Without compression, most of the time is my slow disk:

> system.time(save(x, file='/tempxx.Rdata', compress=FALSE))
   user  system elapsed
   5.07    1.27   56.66

The size on disk was 119MB.

On Mon, Apr 28, 2008 at 8:51 PM, Thomas Pujol <thomas.pujol at yahoo.com> wrote:
> Prof,
> Thanks for your generous assistance.
>
> I'm unsure, but an thinking that to utilize one of MS SQL Server's bulk import utilities, I'll need to export my dataframe to a "flat-file".
>
> Any tips on the best approach for exporting such a large dataframe to a flat-file? Is write() or write.table() the "best" function to use, or are there others I should consider?
>
> Also, not specifc to R, but I came across this:
> "Flat File Bulk Import methods speed comparison in SQL Server 2005"
> http://weblogs.sqlteam.com/mladenp/archive/2006/07/17/10634.aspx
>
> Thanks again.
>
> Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
>
>    I think the short answer is that RODBC is not designed for that, because
>    ODBC is not. There seems to be an ODBC extension specific to SQL Server
>    to do so (somewhere said 'SQL Server version 7.0 or later', which may not
>    apply to you).
>
>    I'm pretty unlikely to add support for just one database, especially one
>    that requires files from SQL Server. Also, I don't know of any RODBC /SQL
>    Server users who might be motivated to do so.
>
>    There is work in progress to implement SQLBulkOperations, but that is a
>    different concept (and not yet wired up to sqlSave).
>
>    On Mon, 28 Apr 2008, Thomas Pujol wrote:
>
>    > I am using R2.6.0 on ?Windows Small Business Server 2003?. I apologize
>    > if the answer to my question is available?I have searched but have not
>    > found anything that I thought helped me.
>    >
>    > I have a dataframe that contains ~4.5 million rows and 5 columns.
>    > (see memory and df details below). I am trying to save the dataframe to
>    > a MS SQL Server database, using the ?sqlSave? function. The code below
>    > seems to work, but takes several hours.
>    >
>    > ?sqlSave(channel, dat=idxdata, varTypes=c(ddates="datetime") )?
>    >
>    > Any tips how I can speed things up? Or is my dataframe so large that it
>    > is going to take a while? (I have ~20 dataframes that I need to save to
>    > SQL, so speed is somewhat important.) Is there an altogether different
>    > approach I should consider taking?
>
>    Use a different client that does implement bulk copy operations? At least
>    SQL Server 2005 comes with a bcp.exe command-line client to do this. See
>    http://msdn2.microsoft.com/en-us/library/ms188728.aspx
>
>
>    > FYI, here is information re: the dataframe and memory on my system.
>    > Please let me know if there is any further information I should provide.
>    >
>    > > memory.size(max = F) #reports amount of memory currently in use
>    > [1] 131.8365
>    >
>    > > str(idxdata)
>    > 'data.frame': 4474553 obs. of 5 variables:
>    > $ idkey : int 1003 1003 1003 1003 1003 1003 1003 1003 1003 1003 ...
>    > $ nnd : Factor w/ 25 levels "01","01C","02",..: 1 1 1 1 1 1 1 1 1 1 ...
>    > $ curcdd : Factor w/ 2 levels "CAD","USD": 2 2 2 2 2 2 2 2 2 2 ...
>    > $ ddates:Class 'Date' num [1:4474553] 6942 6943 6944 6945 6948 ...
>    > $ idx : num 100 100 100 100 100 100 100 100 100 100 ...
>    >
>    > > object.size(idxdata)
>    > [1] 125289688
>
>    --
>    Brian D. Ripley, ripley at stats.ox.ac.uk
>    Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>    University of Oxford, Tel: +44 1865 272861 (self)
>    1 South Parks Road, +44 1865 272866 (PA)
>    Oxford OX1 3TG, UK Fax: +44 1865 272595
>
>
> ---------------------------------
> [[elided Yahoo spam]]
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list