[R-sig-DB] Serialize data.frame to database

Simon Zehnder @zehnder @end|ng |rom un|-bonn@de
Tue Jul 16 22:58:33 CEST 2013


So I found finally a way how it can be done.... don't know though if it can be called efficient:

library(RSQLite)
con <- dbDriver("SQLite")
db <- dbConnect(con, dbname = "test")
dbSendQuery("CREATE TABLE frames(simID INT, data TEXT)")
iris.ascii <- serialize(head(iris), NULL, ascii = TRUE)
dbSendQuery(db, paste("INSERT INTO frames VALUES(1, '", rawToChar(iris.ascii), "')", sep = ""))
res <- dbGetQuery("SELECT * FROM frames WHERE simID = 1")
unserialize(charToRaw(res$data)) 

dbDisconnect(db)
sqliteCloseDriver(con)

WIth this, an R data.frame can be stored into ONE ROW of a database table and therefore be related to an ID for which it can be searched quickly inside the database. Furthermore a pure character (or long character) does not consume much space (It can also be used a BLOB instead of a TEXT type in the CREATE TABLE query).

Still, there are some questions left to me: 

1. Why isn't it possible to convert a binary raw vector to a character?

iris.bin <- serialize(head(iris), NULL, ascii = FALSE)
rawToChar(iris.bin)
Error in rawToChar(iris.bin)
  embedded nul in string: X'...

2. Is a binary raw more efficient than an ascii? If so why?

	See https://stat.ethz.ch/pipermail/r-sig-db/2008q1/000423.html from Dirk Eddelbuettel 

3. Why aren't BLOBs supported for matrices, vectors, dataframes in R directly? Isn't this the usual way to store for example arrays from Java, C, C++, Python, etc. in databases, like serialize to binary, store, fetch unserialize?

4. How efficient is this solution really (independent from the problem to be solved by a database solution)? It is one simple query: Go to the line with simID = 1 take what is in the second field. 


Best 

Simon

On Jul 16, 2013, at 10:39 AM, Simon Zehnder <szehnder using uni-bonn.de> wrote:

> Dear all,
> 
> I need a very fast and reliable database solution so I try to serialize a data.frame (to binary data) and to store this data to an SQLite database (I might be wrong with this approach and a data.frame can be efficiently and faster stored and fetched in a database row in a different manner - please feel free to make suggestions. It is just what I know from Java how to store a matrix). 
> 
> This is what I tried to do:
> 
> library(RSQLite)
> con <- dbDriver("SQLite")
> db <- dbConnect(con, "test")
> dbSendQuery(db, 'CREATE TABLE frames("simID" INT, "data" BLOB)')
> data.bin <- serialize(iris, NULL, ascii = FALSE)
> dbSendQuery(db, paste("INSERT INTO frames VALUES(1, X'", data.bin, "')", sep = ""))
> data.bin2 <- dbGetQuery(db, "SELECT DATA FROM frames WHERE simID = 1")
> data.bin2
> data
> 1   58
> 
> So, only the first entry of data.bin is saved to the database. I tried to first convert the binary data to raw data:
> data.raw <- rawToChar(data.bin)
> Error in rawToChar(data.bin) :
> embedded nul in string: 'X\n\0\0\0\002\0\003\0\001\0\002\003\0\0\0\003\023\0\0\0\005\0\0\0\016\0\0\0\x96@\024ffffff@\023\x99\x99\x99\x99\x99\x9a@\022\xcc\xcc\xcc\xcc\xcc\xcd@\022ffffff@\024\0\0\0\0\0\0@\025\x99\x99\x99\x99\x99\x9a@\022ffffff@\024\0\0\0\0\0\0@\021\x99\x99\x99\x99\x99\x9a@\023\x99\x99\x99\x99\x99\x9a@\025\x99\x99\x99\x99\x99\x9a@\023333333@\023333333@\021333333@\027333333@\026\xcc\xcc\xcc\xcc\xcc\xcd@\025\x99\x99\x99\x99\x99\x9a@\024ffffff@\026\xcc\xcc\xcc\xcc\xcc\xcd@\024ffffff@\025\x99\x99\x99\x99\x99\x9a@\024ffffff@\022ffffff@\024ffffff@\023333333@\024\0\0\0\0\0\0@\024\0\0\0\0\0\0@\024\xcc\xcc\xcc\xcc\xcc\xcd@\024\xcc\xcc\xcc\xcc\xcc\xcd@\022\xcc\xcc\xcc\xcc\xcc\xcd@\023333333@\025\x99\x99\x99\x99\x99\x9a@\024\xcc\xcc\xcc\xcc\xcc\xcd@\026\0\0\0\0\0\0@\023\x99\x99\x99\x99\x99\x9a@\024\0\0\0\0\0\0@\026\0\0\0\0\0\0@\023\x99\x99\x99\x99\x99\x9a@\021\x99\x99\x99\x99\x99\x9a@\024ffffff@\024\0\0\0\0\0\0@\022\0\0\0\0\0\0@\021\x99\x99\x99\x99\x99\x9a@\024\0\0\0\0\0!
> \0
> 
> I don't know what this error should tell me. Then I tried to use the ASCII format
> 
> data.ascii <- serialize(iris, NULL, ascii = TRUE)
> data.raw <- rawToChar(data.ascii)
> dbSendQuery(db, "DELETE FROM frames")
> dbSendQuery(db, paste("INSERT INTO frames VALUES(1, X'", data.raw, "')", sep = ""))
> Error in sqliteExecStatement(conn, statement, ...) :
> RS-DBI driver: (error in statement: unrecognized token: "X'A
> 
> This also does not work. It seems the driver does not deal that nicely with the regular INSERT query for BLOB objects in SQLite. Then I used a simpler way:
> 
> dbSendQuery(db, "DELETE FROM frames")
> dbSendQuery(db, "DROP TABLE frames")
> dbSendQuery(db, 'CREATE TABLE frames("simID" INT, "data" TEXT DEFAULT NULL)')
> dbSendQuery(db, paste("INSERT INTO frames VALUES(1, '", data.raw, "')", sep = ""))
> data.bin2 <- dbGetQuery(db, "SELECT data FROM frames WHERE simID = 1")
> 
> Nice, that worked. Now I want to unserialize the data:
> 
> unserialize(data.bin2)
> Error in unserialize(data.bin2) : 'connection' must be a connection
> 
> unserialize(data.bin2[1, 'data'])
> Error in unserialize(data.bin2[1, "data"]) :
> character vectors are no longer accepted by unserialize()
> 
> I feel a little stuck here, but I am very sure, that converting data.frames to binary data and storing them to a database is not that unusual. So I hope somebody has already done this and could give me the missing piece.
> 
> 
> Best
> 
> Simon
> 
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-db




More information about the R-sig-DB mailing list