[R-sig-DB] RSQLite and transparent compression

Rainer M Krug R@|ner @end|ng |rom krug@@de
Tue Aug 6 15:24:19 CEST 2013


Kasper Daniel Hansen
<kasperdanielhansen using gmail.com> writes:

> Well, the technical questions is really whether you can do sqlite
> operations on a compressed database.  Otherwise, all you can do is
> externally compresed the database and then decompress it every time you
> want to access it, which may be tedious and slow.  sqlite is used a lot on
> devices with very limited resources, so it is entirely possible that there
> is some compression possibility, which is why I suggest you read the
> documentation (argh!).

Might this help:
http://lserinol.blogspot.fr/2008/02/sqlite-compression.html

Cheers,

Rainer

>
> Finally, 10-20GB for a textfile is not that big.  If you do not have enough
> RAM you must be working on a constrained system.
>
> Kasper
>
>
> On Tue, Aug 6, 2013 at 12:35 AM, Grant Farnsworth <gvfarns using gmail.com> wrote:
>
>> On Tue, Aug 6, 2013 at 12:02 AM, Kasper Daniel Hansen
>> <kasperdanielhansen using gmail.com> wrote:
>> > What do you mean by large?  You are aware you can have an in-memory
>> version
>> > of a SQLite database (whether that helps depends on the size of course)?
>>  If
>> > you operate on a disk based database, fast I/O helps a lot, perhaps even
>> > copying the database to a local drive. I don't know anything about
>> > compression though, but in general I have found the sqlite.org website
>> and
>> > its mailing list to be super helpful.
>>
>>
>> Not outrageously large.  I'd say 10-20GB each as text delimited files.
>>  Still, it's too large to put in RAM and work with.  This is why I use
>> SQLite.  I get these files as gzipped delimited text files, then I
>> read them a million lines or so at a time using scan(), do some basic
>> clean up, and stuff them into a big SQLite database.  When I want to
>> use the data, I just subset the stuff I need, which fits comfortably
>> into RAM.  If the datasets were small enough, I'd just store them in
>> an R data file...then I wouldn't have to worry about type conversions
>> or variable name issues.
>>
>> I guess it just seems wasteful to have these huge files sitting around
>> (or move them across networks) when the raw data was compressed and I
>> know the sqlite databases would compress nicely as well.  That's why
>> I'm specifically looking for a compression solution.  I'd be open to
>> other approaches, of course.  For example, I could imagine ways to
>> append the data into a dataframe in an .rda or .rds file and then
>> subset it later without ever having to load the whole thing into ram
>> if I used some of the big data packages, but besides the file size I'm
>> pretty happy with the SQLite solution---it just seemed like
>> transparent zipping might be available and I was surprised to find
>> that it wasn't.
>>
>> By the way, speed isn't a critical issue.  It's not super
>> time-sensitive work and the network to my file server is plenty fast.
>> It just seems like I might have missed an obvious way to save the
>> space and time that lack of compression causes.
>>
>> _______________________________________________
>> R-sig-DB mailing list -- R Special Interest Group
>> R-sig-DB using r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
<#secure method=pgpmime mode=sign>

-- 
Rainer M. Krug

email: RMKrug<at>gmail<dot>com




More information about the R-sig-DB mailing list