[R] Fast update of a lot of records in a database?

Duncan Murdoch murdoch at stats.uwo.ca
Sat May 20 03:17:57 CEST 2006


On 5/19/2006 3:19 PM, hadley wickham wrote:
>> put the updates into a temporary table called updates
>>
>> UPDATE bigtable AS a
>>    FROM updates AS b
>>    WHERE a.id = b.id
>>    SET a.col1 = b.col1
> 
> I don't think this will be any faster - why would creating a new table
> be faster than updating existing rows?

I don't know, but I assumed each SQL statement resulted in some sort of 
turnaround delay with the database.  Creating the new table requires 
just a few statements, one of which is a huge INSERT statement).

I haven't had a chance yet to do the timing, but creating a test update 
table with 400000 new values hadn't finished in an hour.

> I've never had a problem with using large numbers of SQL update
> statements (in the order of hundreds of thousands) to update a table
> and having them complete in a reasonable time (a few minutes).  How
> heavily indexed is the field you are updating?  

The field being updated isn't indexed; the id field is.

You may be able to get
> some speed improvements by turning off indices before the update and
> back on again afterwards (highly dependent on your database system
> though).
> 
> I would strongly suspect your bottleneck lies elsewhere (eg. when
> generating the statements in R, or using ODBC to send them)

It's not in R (that goes quickly), but I don't know how to distinguish 
ODBC slowdowns from database slowdowns.  On my machine ODBC is the only 
option.

Part of the problem is probably that I am remote from the server, using 
ODBC over an SSH tunnel.  But the original "thousands of UPDATES" was 
taking hours running on a machine much closer to the server.  It was 
using RdbiPgsql, I'm using RODBC.  Doing the SELECT takes me a couple of 
minutes, and is faster locally; why is UPDATE so slow?

Duncan




More information about the R-help mailing list