[R-sig-DB] concurrent reading/writing in "chunks" with RSQLite (need some help troubleshooting)

Seth Falcon @eth @end|ng |rom u@erpr|m@ry@net
Mon Jul 5 22:09:59 CEST 2010


Jonathan,

On Mon, Jul 5, 2010 at 12:36 PM, Jonathan Greenberg
<greenberg using ucdavis.edu> wrote:
> Keith and Seth (and r-sig-db'ers):
>
> I somewhat understand the difference between both of your solutions --
> Seth, I'm not a database person (quickly learning tho) but does your
> code hold the output in memory before writing it all at the end (via
> dbCommit), or is each iteration's output being stored in some "waiting
> room" (on disk) where it gets written at the very end?  The key (for
> my purposes) is to avoid keeping more than a chunk's worth of input
> AND output in memory at any one time.

It seems at this point that your simplified example is no longer
sufficient for understanding what you want to achieve.  I'm pretty
sure that data within a transaction will be written to disk (for a
disk-based db) before commit is called.  This is something that will
be determined by SQLite itself -- you can go read up on how things
work.  Or you can do some experiments with your data and see how it
looks.

Depending on the manipulations you actually want to perform before
inserting the data, you might be able to keep everything in the db.
You could use SQLite extension functions (see RSQLite.extfuns) to
expand what is possible at the SQL level

> Along the lines of Keith's solution, I noticed the COUNT query can be
> VERY slow -- there is no rapid way of getting the number of rows in a
> database table?

No, there is no notion of number of rows in a table that I am aware
of.  Note that COUNT is a way of aggregating the number of rows
returned from a query and in general there is no way for the db to
know how many rows will match a query.

I expect Keith's solution to be quite slow.  It is executing a
separate query each time through the loop and making a linear scan
through the table to do the offset.

+ seth




More information about the R-sig-DB mailing list