[R-sig-DB] Parameterised queries
Paul Gilbert
pg||bert902 @end|ng |rom gm@||@com
Sat Feb 14 03:55:48 CET 2015
On 15-02-13 08:49 PM, Tim Keitt wrote:
>
>
> On Fri, Feb 13, 2015 at 5:05 PM, Alex Mandel <tech_dev using wildintellect.com
> <mailto:tech_dev using wildintellect.com>> wrote:
>
> On 02/13/2015 02:53 PM, Paul Gilbert wrote:
> > (Possibly naive question.)
> >
> >> The reason inserts are slow...
> >
> > I've now seen this stated a few times on this thread and I don't fully
> > understand the context everyone seems to be assuming. With a Perl client
> > I can insert multiple records with a single insert command, over a
> > network, and it seems to be nearly as fast as loading with the batch
> > utility locally on the server. As I recall, I had to block less that
> > 10,000 records or something choked, but multiple blocks loaded a fairly
> > large database reasonably quickly. (I'm travelling and cannot verify
> > this number at the moment.) It is import to load before specifying
> > indexes, otherwise the index gets recalculated with each insert.
> >
> > When people say inserts are slow:
> >
> > -Is this an R specific problem?
> > -Is it assumed indexes are already specified?
> > -Or am I missing something else once again?
> >
> > Thanks,
> > Paul
> >
>
> It's not exactly R specific, though we are discussing the implementation
> in R.
>
> If you bulk load generally you want it to be an atomic commit (all
> records or none), and you want it to hold of on doing triggers and
> indexes until after the data has made it to the db. Different Dbs have
> different performance levels.
>
> In postgres you can compare using the psql copy command vs sending a sql
> file with INSERT commands.
>
>
> The example for "copy_to" in rpg compares pushing the hflights db to
> postgresql using inserts and copy. You'll notice the insert version
> ('write_table') is commented out because it takes 5 minutes or so and I
> got tired of waiting. Using 'copy_to' is less than 5 seconds. No indices
> or constraints in operation.
I don't have the tools to test anything at the moment, but the rpg note
under write_tables says "write_table uses SQL INSERT statements".
Just to be clear, I know
INSERT INTO PRODUCT (name, price) VALUES (bread, 1.0)
...
INSERT INTO PRODUCT (name, price) VALUES (butter, 2.5)
is slow, it is
INSERT INTO PRODUCT (name, price) VALUES (
(bread, 1.0),
...
(butter, 2.5))
that I think is fairly fast. (I might have the syntax wrong.)
Paul
>
>
> More background
> http://en.wikipedia.org/wiki/Prepared_statement
> http://rosettacode.org/wiki/Parametrized_SQL_statement
>
> Not knowing how you wrote your Perl code, you may have been using
> prepared statements. In which case both of your runs remote or local
> should of had similar performance. What's being discussed is the
> introduction of prepared statements which has not existed for most Db
> connectors in R previously.
>
> Having done bulk loads in the 10 GB+ range before I can tell you it
> matters a lot.
>
>
> I've experimented with using parallel R to run many inserts in parallel.
> Still have not gotten it worked out and I'm not sure it will help in any
> event owing to write contention. Would be interesting to have a good
> test case.
>
> THK
>
>
> Thanks,
> Alex
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using r-project.org <mailto:R-sig-DB using r-project.org>
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
>
>
>
> --
> http://www.keittlab.org/
More information about the R-sig-DB
mailing list