[R] INSERT OR UPDATE

Mikkel Grum mi2kelgrum at yahoo.com
Mon May 2 22:03:38 CEST 2011


I'm trying to insert rows of a data.frame into a database table, or update where the key fields of a record already exist in the table. I've come up with a possible solution below, but would like to hear if anyone has a better solution.

# The problem demonstrated:
# Create a data.frame with test values
library(RODBC)
tbl <- data.frame(
			key1 = rep(1:3, each = 2),
			key2 = rep(LETTERS[1:2], 3),
			somevalue = rnorm(6)
		)
		
# Create table in database using the following SQL
CREATE TABLE tbl
(
  key1 integer NOT NULL,
  key2 character varying(1) NOT NULL,
  somevalue double precision,
  CONSTRAINT pktbl PRIMARY KEY (key1, key2)
)

# Continue in R
pg <- odbcConnect("testdb")
sqlSave(pg, tbl[1:2, ], append = TRUE, rownames = FALSE)
sqlSave(pg, tbl[3, ], append = TRUE, rownames = FALSE)

tbl[1, 3] <- 1
sqlUpdate(pg, tbl[1:4, ], index = c("key1", "key2")) # Fails

# Can replace the above sqlUpdate with:
sqlUpdate(pg, tbl[1:3, ], index = c("key1", "key2")) 
sqlSave(pg, tbl[4, ], append = TRUE, rownames = FALSE)

# Proposed solution:
tbl[1, 3] <- 0
tmp <- tbl
yes <- sqlQuery(pg, "SELECT key1, key2 FROM tabl", as.is = TRUE)
for (i in seq(along = present$key1)) {
	sqlUpdate(pg, tmp[tmp$key1 == yes$key1[i] & tmp$key2 == yes$key2[i], ], "tbl", index = c("key1", "key2"))
	tmp <- tmp[!(tmp$key1 == yes$key1[i] & tmp$key2 == yes$key2[i]), ]
}
sqlSave(pg, tmp, "tbl", append = TRUE, rownames = FALSE)

This is fine for small tables, where the need for updates is frequent, and there is no risk of anyone else doing the same thing at the same time. If the table is big and updates are rare, it seems like quite an overhead for what would essential be inserts. Does anyone have a more rational way of doing this with big data sets where updates are rare, e.g. only do it if sqlSave fails?

Is it possible to put a lock on the database while doing the updates and  inserts to avoid problems with concurrency?

I'm working with PostgreSQL, but the example should be generic.

Thanks in advance
Mikkel



More information about the R-help mailing list