[R] Fast update of a lot of records in a database?
Duncan Murdoch
murdoch at stats.uwo.ca
Sat May 20 03:17:57 CEST 2006
On 5/19/2006 3:19 PM, hadley wickham wrote:
>> put the updates into a temporary table called updates
>>
>> UPDATE bigtable AS a
>> FROM updates AS b
>> WHERE a.id = b.id
>> SET a.col1 = b.col1
>
> I don't think this will be any faster - why would creating a new table
> be faster than updating existing rows?
I don't know, but I assumed each SQL statement resulted in some sort of
turnaround delay with the database. Creating the new table requires
just a few statements, one of which is a huge INSERT statement).
I haven't had a chance yet to do the timing, but creating a test update
table with 400000 new values hadn't finished in an hour.
> I've never had a problem with using large numbers of SQL update
> statements (in the order of hundreds of thousands) to update a table
> and having them complete in a reasonable time (a few minutes). How
> heavily indexed is the field you are updating?
The field being updated isn't indexed; the id field is.
You may be able to get
> some speed improvements by turning off indices before the update and
> back on again afterwards (highly dependent on your database system
> though).
>
> I would strongly suspect your bottleneck lies elsewhere (eg. when
> generating the statements in R, or using ODBC to send them)
It's not in R (that goes quickly), but I don't know how to distinguish
ODBC slowdowns from database slowdowns. On my machine ODBC is the only
option.
Part of the problem is probably that I am remote from the server, using
ODBC over an SSH tunnel. But the original "thousands of UPDATES" was
taking hours running on a machine much closer to the server. It was
using RdbiPgsql, I'm using RODBC. Doing the SELECT takes me a couple of
minutes, and is faster locally; why is UPDATE so slow?
Duncan
More information about the R-help
mailing list