[R-sig-DB] Parameterised queries

Paul Gilbert pg||bert902 @end|ng |rom gm@||@com
Sat Feb 14 03:55:48 CET 2015



On 15-02-13 08:49 PM, Tim Keitt wrote:
>
>
> On Fri, Feb 13, 2015 at 5:05 PM, Alex Mandel <tech_dev using wildintellect.com
> <mailto:tech_dev using wildintellect.com>> wrote:
>
>     On 02/13/2015 02:53 PM, Paul Gilbert wrote:
>     > (Possibly naive question.)
>     >
>     >> The reason inserts are slow...
>     >
>     > I've now seen this stated a few times on this thread and I don't fully
>     > understand the context everyone seems to be assuming. With a Perl client
>     > I can insert multiple records with a single insert command, over a
>     > network, and it seems to be nearly as fast as loading with the batch
>     > utility locally on the server. As I recall, I had to block less that
>     > 10,000 records or something choked, but multiple blocks loaded a fairly
>     > large database reasonably quickly. (I'm travelling and cannot verify
>     > this number at the moment.) It is import to load before specifying
>     > indexes, otherwise the index gets recalculated with each insert.
>     >
>     > When people say inserts  are slow:
>     >
>     > -Is this an R specific problem?
>     > -Is it assumed indexes are already specified?
>     > -Or am I missing something else once again?
>     >
>     > Thanks,
>     > Paul
>     >
>
>     It's not exactly R specific, though we are discussing the implementation
>     in R.
>
>     If you bulk load generally you want it to be an atomic commit (all
>     records or none), and you want it to hold of on doing triggers and
>     indexes until after the data has made it to the db. Different Dbs have
>     different performance levels.
>
>     In postgres you can compare using the psql copy command vs sending a sql
>     file with INSERT commands.
>
>
> The example for "copy_to" in rpg compares pushing the hflights db to
> postgresql using inserts and copy. You'll notice the insert version
> ('write_table') is commented out because it takes 5 minutes or so and I
> got tired of waiting. Using 'copy_to' is less than 5 seconds. No indices
> or constraints in operation.

I don't have the tools to test anything at the moment, but the rpg note 
under write_tables says "write_table uses SQL INSERT statements".

Just to be clear, I know

     INSERT INTO PRODUCT (name, price) VALUES (bread, 1.0)
     ...
     INSERT INTO PRODUCT (name, price) VALUES (butter, 2.5)

is slow, it is

     INSERT INTO PRODUCT (name, price) VALUES (
       (bread, 1.0),
        ...
       (butter, 2.5))

that I think is fairly fast. (I might have the syntax wrong.)

Paul
>
>
>     More background
>     http://en.wikipedia.org/wiki/Prepared_statement
>     http://rosettacode.org/wiki/Parametrized_SQL_statement
>
>     Not knowing how you wrote your Perl code, you may have been using
>     prepared statements. In which case both of your runs remote or local
>     should of had similar performance. What's being discussed is the
>     introduction of prepared statements which has not existed for most Db
>     connectors in R previously.
>
>     Having done bulk loads in the 10 GB+ range before I can tell you it
>     matters a lot.
>
>
> I've experimented with using parallel R to run many inserts in parallel.
> Still have not gotten it worked out and I'm not sure it will help in any
> event owing to write contention. Would be interesting to have a good
> test case.
>
> THK
>
>
>     Thanks,
>     Alex
>
>     _______________________________________________
>     R-sig-DB mailing list -- R Special Interest Group
>     R-sig-DB using r-project.org <mailto:R-sig-DB using r-project.org>
>     https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
>
>
>
> --
> http://www.keittlab.org/




More information about the R-sig-DB mailing list