[R] tips to speed up sqlSave with MS SQL Server?
jim holtman
jholtman at gmail.com
Tue Apr 29 03:14:17 CEST 2008
?save
This will write the object to a file in the fastest manner. Here is an example:
> x <- runif(125000000/8)
> object.size(x)
[1] 125000024
> system.time(save(x, file='/tempxx.Rdata'))
user system elapsed
56.84 0.86 87.97
>
This wound up to be 79MB on disk after compression.
Without compression, most of the time is my slow disk:
> system.time(save(x, file='/tempxx.Rdata', compress=FALSE))
user system elapsed
5.07 1.27 56.66
The size on disk was 119MB.
On Mon, Apr 28, 2008 at 8:51 PM, Thomas Pujol <thomas.pujol at yahoo.com> wrote:
> Prof,
> Thanks for your generous assistance.
>
> I'm unsure, but an thinking that to utilize one of MS SQL Server's bulk import utilities, I'll need to export my dataframe to a "flat-file".
>
> Any tips on the best approach for exporting such a large dataframe to a flat-file? Is write() or write.table() the "best" function to use, or are there others I should consider?
>
> Also, not specifc to R, but I came across this:
> "Flat File Bulk Import methods speed comparison in SQL Server 2005"
> http://weblogs.sqlteam.com/mladenp/archive/2006/07/17/10634.aspx
>
> Thanks again.
>
> Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
>
> I think the short answer is that RODBC is not designed for that, because
> ODBC is not. There seems to be an ODBC extension specific to SQL Server
> to do so (somewhere said 'SQL Server version 7.0 or later', which may not
> apply to you).
>
> I'm pretty unlikely to add support for just one database, especially one
> that requires files from SQL Server. Also, I don't know of any RODBC /SQL
> Server users who might be motivated to do so.
>
> There is work in progress to implement SQLBulkOperations, but that is a
> different concept (and not yet wired up to sqlSave).
>
> On Mon, 28 Apr 2008, Thomas Pujol wrote:
>
> > I am using R2.6.0 on ?Windows Small Business Server 2003?. I apologize
> > if the answer to my question is available?I have searched but have not
> > found anything that I thought helped me.
> >
> > I have a dataframe that contains ~4.5 million rows and 5 columns.
> > (see memory and df details below). I am trying to save the dataframe to
> > a MS SQL Server database, using the ?sqlSave? function. The code below
> > seems to work, but takes several hours.
> >
> > ?sqlSave(channel, dat=idxdata, varTypes=c(ddates="datetime") )?
> >
> > Any tips how I can speed things up? Or is my dataframe so large that it
> > is going to take a while? (I have ~20 dataframes that I need to save to
> > SQL, so speed is somewhat important.) Is there an altogether different
> > approach I should consider taking?
>
> Use a different client that does implement bulk copy operations? At least
> SQL Server 2005 comes with a bcp.exe command-line client to do this. See
> http://msdn2.microsoft.com/en-us/library/ms188728.aspx
>
>
> > FYI, here is information re: the dataframe and memory on my system.
> > Please let me know if there is any further information I should provide.
> >
> > > memory.size(max = F) #reports amount of memory currently in use
> > [1] 131.8365
> >
> > > str(idxdata)
> > 'data.frame': 4474553 obs. of 5 variables:
> > $ idkey : int 1003 1003 1003 1003 1003 1003 1003 1003 1003 1003 ...
> > $ nnd : Factor w/ 25 levels "01","01C","02",..: 1 1 1 1 1 1 1 1 1 1 ...
> > $ curcdd : Factor w/ 2 levels "CAD","USD": 2 2 2 2 2 2 2 2 2 2 ...
> > $ ddates:Class 'Date' num [1:4474553] 6942 6943 6944 6945 6948 ...
> > $ idx : num 100 100 100 100 100 100 100 100 100 100 ...
> >
> > > object.size(idxdata)
> > [1] 125289688
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
>
> ---------------------------------
> [[elided Yahoo spam]]
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list