[R] Compress string memCompress/Decompress

Matt Shotwell shotwelm at musc.edu
Sun Jul 11 20:31:43 CEST 2010


On Fri, 2010-07-09 at 20:02 -0400, Erik Wright wrote:
> Hi Matt,
> 
> This works great, thanks!
> 
> At first I got an error message saying BLOB is not implemented in RSQLite.  When I updated to the latest version it worked.

SQLite began to support BLOBs from version 3.0.

> 
> Is there any reason the string needs to be stored as type BLOB?  It seems to work the same when I swap "BLOB" with "TEXT" in the CREATE TABLE command.

SQLite has a dynamic-type system. That is, data types are associated
with values rather than with their container (column). This means that
most columns in a table can store more than just the type (or
'affinity') it is declared with. I think that's what happens when you
use TEXT rather than BLOB. If you use something like x'FFFFA9' to insert
data into a column with TEXT affinity, I believe it is stored as a BLOB
regardless.

-Matt

> Thanks again!,
> Erik
> 
> 
> 
> On Jul 9, 2010, at 3:21 PM, Matt Shotwell wrote:
> 
> > Erik, 
> > 
> > Can you store the data as a blob? For example:
> > 
> >> #create string, compress with gzip, convert to SQLite blob string
> >> string <- "gzip this string, store as blob in SQLite database"
> >> string.gz <- memCompress(string, type="gzip")
> >> string.sqlite <- paste("x'",paste(string.gz,collapse=""),"'",sep="")
> > 
> >> #create database and table with a BLOB column
> >> library(RSQLite)
> > Loading required package: DBI
> >> con <- dbConnect(dbDriver("SQLite"), "compress.sqlite")
> >> dbGetQuery(con, "CREATE TABLE Compress (id INTEGER, data BLOB);")
> > NULL
> > 
> >> #insert the string as a blob
> >> query <- paste("INSERT INTO Compress (id, data) VALUES (1, ", 
> > + string.sqlite, ");", sep="")
> >> dbGetQuery(con, query)
> > NULL
> > 
> >> #recover the blob, decompress, and convert back to a string
> >> result <- dbGetQuery(con, "SELECT data FROM Compress;")
> >> string.gz <- result[[1]][[1]]
> >> string <- memDecompress(string.gz, type="gzip")
> >> rawToChar(string)
> > [1] "gzip this string, store as blob in SQLite database"
> > 
> > 
> > -Matt
> > 
> > 
> > 
> > On Fri, 2010-07-09 at 12:51 -0400, Erik Wright wrote:
> >> Hello,
> >> 
> >> I would like to compress a long string (character vector), store the compressed string in the text field of a SQLite database (using RSQLite), and then load the text back into memory and decompress it back into the the original string.  My character vector can be compressed considerably using standard gzip/bzip2 compression.  In theory it should be much faster for me to compress/decompress a long string than to write the whole string to the hard drive and then read it back (not to mention the saved hard drive space).
> >> 
> >> I have tried accomplishing this task using memCompress() and memDecompress() without success.  It seems memCompress can only convert a character vector to raw type which cannot be treated as a string.  Does anyone have ideas on how I can go about doing this, especially using the standard base packages?
> >> 
> >> Thanks!,
> >> Erik
> >> 
> >> 
> >>> sessionInfo()
> >> R version 2.11.0 (2010-04-22) 
> >> x86_64-apple-darwin9.8.0 
> >> 
> >> locale:
> >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> >> 
> >> attached base packages:
> >> [1] stats     graphics  grDevices utils     datasets  methods   base     
> >> 
> >> loaded via a namespace (and not attached):
> >> [1] tools_2.11.0
> >> 
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > -- 
> > Matthew S. Shotwell
> > Graduate Student
> > Division of Biostatistics and Epidemiology
> > Medical University of South Carolina
> > http://biostatmatt.com
> > 
> 
-- 
Matthew S. Shotwell
Graduate Student
Division of Biostatistics and Epidemiology
Medical University of South Carolina
http://biostatmatt.com



More information about the R-help mailing list