[R-sig-DB] RSQLite and transparent compression

Sean Davis @d@v|@2 @end|ng |rom m@||@n|h@gov
Tue Aug 6 16:15:23 CEST 2013


On Tue, Aug 6, 2013 at 9:24 AM, Rainer M Krug <Rainer using krugs.de> wrote:
> Kasper Daniel Hansen
> <kasperdanielhansen using gmail.com> writes:
>
>> Well, the technical questions is really whether you can do sqlite
>> operations on a compressed database.  Otherwise, all you can do is
>> externally compresed the database and then decompress it every time you
>> want to access it, which may be tedious and slow.  sqlite is used a lot on
>> devices with very limited resources, so it is entirely possible that there
>> is some compression possibility, which is why I suggest you read the
>> documentation (argh!).
>
> Might this help:
> http://lserinol.blogspot.fr/2008/02/sqlite-compression.html

Short of an SQLite plugin, you might look at using a FUSE-based
compressed filesystem to store the SQLite database.  I have no idea
how well this plays with sqlite or what the performance will be, but
it should be simple enough to test.

http://sourceforge.net/apps/mediawiki/fuse/?title=CompressedFileSystems

Sean



> Cheers,
>
> Rainer
>
>>
>> Finally, 10-20GB for a textfile is not that big.  If you do not have enough
>> RAM you must be working on a constrained system.
>>
>> Kasper
>>
>>
>> On Tue, Aug 6, 2013 at 12:35 AM, Grant Farnsworth <gvfarns using gmail.com> wrote:
>>
>>> On Tue, Aug 6, 2013 at 12:02 AM, Kasper Daniel Hansen
>>> <kasperdanielhansen using gmail.com> wrote:
>>> > What do you mean by large?  You are aware you can have an in-memory
>>> version
>>> > of a SQLite database (whether that helps depends on the size of course)?
>>>  If
>>> > you operate on a disk based database, fast I/O helps a lot, perhaps even
>>> > copying the database to a local drive. I don't know anything about
>>> > compression though, but in general I have found the sqlite.org website
>>> and
>>> > its mailing list to be super helpful.
>>>
>>>
>>> Not outrageously large.  I'd say 10-20GB each as text delimited files.
>>>  Still, it's too large to put in RAM and work with.  This is why I use
>>> SQLite.  I get these files as gzipped delimited text files, then I
>>> read them a million lines or so at a time using scan(), do some basic
>>> clean up, and stuff them into a big SQLite database.  When I want to
>>> use the data, I just subset the stuff I need, which fits comfortably
>>> into RAM.  If the datasets were small enough, I'd just store them in
>>> an R data file...then I wouldn't have to worry about type conversions
>>> or variable name issues.
>>>
>>> I guess it just seems wasteful to have these huge files sitting around
>>> (or move them across networks) when the raw data was compressed and I
>>> know the sqlite databases would compress nicely as well.  That's why
>>> I'm specifically looking for a compression solution.  I'd be open to
>>> other approaches, of course.  For example, I could imagine ways to
>>> append the data into a dataframe in an .rda or .rds file and then
>>> subset it later without ever having to load the whole thing into ram
>>> if I used some of the big data packages, but besides the file size I'm
>>> pretty happy with the SQLite solution---it just seemed like
>>> transparent zipping might be available and I was surprised to find
>>> that it wasn't.
>>>
>>> By the way, speed isn't a critical issue.  It's not super
>>> time-sensitive work and the network to my file server is plenty fast.
>>> It just seems like I might have missed an obvious way to save the
>>> space and time that lack of compression causes.
>>>
>>> _______________________________________________
>>> R-sig-DB mailing list -- R Special Interest Group
>>> R-sig-DB using r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>>>
>>
>>       [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-DB mailing list -- R Special Interest Group
>> R-sig-DB using r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>>
> <#secure method=pgpmime mode=sign>
>
> --
> Rainer M. Krug
>
> email: RMKrug<at>gmail<dot>com
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-db




More information about the R-sig-DB mailing list