[R-sig-DB] Parameterised queries

Tim Keitt tke|tt @end|ng |rom utex@@@edu
Sat Feb 14 02:49:25 CET 2015


On Fri, Feb 13, 2015 at 5:05 PM, Alex Mandel <tech_dev using wildintellect.com>
wrote:

> On 02/13/2015 02:53 PM, Paul Gilbert wrote:
> > (Possibly naive question.)
> >
> >> The reason inserts are slow...
> >
> > I've now seen this stated a few times on this thread and I don't fully
> > understand the context everyone seems to be assuming. With a Perl client
> > I can insert multiple records with a single insert command, over a
> > network, and it seems to be nearly as fast as loading with the batch
> > utility locally on the server. As I recall, I had to block less that
> > 10,000 records or something choked, but multiple blocks loaded a fairly
> > large database reasonably quickly. (I'm travelling and cannot verify
> > this number at the moment.) It is import to load before specifying
> > indexes, otherwise the index gets recalculated with each insert.
> >
> > When people say inserts  are slow:
> >
> > -Is this an R specific problem?
> > -Is it assumed indexes are already specified?
> > -Or am I missing something else once again?
> >
> > Thanks,
> > Paul
> >
>
> It's not exactly R specific, though we are discussing the implementation
> in R.
>
> If you bulk load generally you want it to be an atomic commit (all
> records or none), and you want it to hold of on doing triggers and
> indexes until after the data has made it to the db. Different Dbs have
> different performance levels.
>
> In postgres you can compare using the psql copy command vs sending a sql
> file with INSERT commands.
>

The example for "copy_to" in rpg compares pushing the hflights db to
postgresql using inserts and copy. You'll notice the insert version
('write_table') is commented out because it takes 5 minutes or so and I got
tired of waiting. Using 'copy_to' is less than 5 seconds. No indices or
constraints in operation.


>
> More background
> http://en.wikipedia.org/wiki/Prepared_statement
> http://rosettacode.org/wiki/Parametrized_SQL_statement
>
> Not knowing how you wrote your Perl code, you may have been using
> prepared statements. In which case both of your runs remote or local
> should of had similar performance. What's being discussed is the
> introduction of prepared statements which has not existed for most Db
> connectors in R previously.
>
> Having done bulk loads in the 10 GB+ range before I can tell you it
> matters a lot.
>

I've experimented with using parallel R to run many inserts in parallel.
Still have not gotten it worked out and I'm not sure it will help in any
event owing to write contention. Would be interesting to have a good test
case.

THK


>
> Thanks,
> Alex
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>



-- 
http://www.keittlab.org/

	[[alternative HTML version deleted]]




More information about the R-sig-DB mailing list