[R-sig-DB] RSQLite and transparent compression

Kasper Daniel Hansen k@@perd@n|e|h@n@en @end|ng |rom gm@||@com
Tue Aug 6 14:21:25 CEST 2013


Well, the technical questions is really whether you can do sqlite
operations on a compressed database.  Otherwise, all you can do is
externally compresed the database and then decompress it every time you
want to access it, which may be tedious and slow.  sqlite is used a lot on
devices with very limited resources, so it is entirely possible that there
is some compression possibility, which is why I suggest you read the
documentation (argh!).

Finally, 10-20GB for a textfile is not that big.  If you do not have enough
RAM you must be working on a constrained system.

Kasper


On Tue, Aug 6, 2013 at 12:35 AM, Grant Farnsworth <gvfarns using gmail.com> wrote:

> On Tue, Aug 6, 2013 at 12:02 AM, Kasper Daniel Hansen
> <kasperdanielhansen using gmail.com> wrote:
> > What do you mean by large?  You are aware you can have an in-memory
> version
> > of a SQLite database (whether that helps depends on the size of course)?
>  If
> > you operate on a disk based database, fast I/O helps a lot, perhaps even
> > copying the database to a local drive. I don't know anything about
> > compression though, but in general I have found the sqlite.org website
> and
> > its mailing list to be super helpful.
>
>
> Not outrageously large.  I'd say 10-20GB each as text delimited files.
>  Still, it's too large to put in RAM and work with.  This is why I use
> SQLite.  I get these files as gzipped delimited text files, then I
> read them a million lines or so at a time using scan(), do some basic
> clean up, and stuff them into a big SQLite database.  When I want to
> use the data, I just subset the stuff I need, which fits comfortably
> into RAM.  If the datasets were small enough, I'd just store them in
> an R data file...then I wouldn't have to worry about type conversions
> or variable name issues.
>
> I guess it just seems wasteful to have these huge files sitting around
> (or move them across networks) when the raw data was compressed and I
> know the sqlite databases would compress nicely as well.  That's why
> I'm specifically looking for a compression solution.  I'd be open to
> other approaches, of course.  For example, I could imagine ways to
> append the data into a dataframe in an .rda or .rds file and then
> subset it later without ever having to load the whole thing into ram
> if I used some of the big data packages, but besides the file size I'm
> pretty happy with the SQLite solution---it just seemed like
> transparent zipping might be available and I was surprised to find
> that it wasn't.
>
> By the way, speed isn't a critical issue.  It's not super
> time-sensitive work and the network to my file server is plenty fast.
> It just seems like I might have missed an obvious way to save the
> space and time that lack of compression causes.
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>

	[[alternative HTML version deleted]]




More information about the R-sig-DB mailing list