[R-sig-DB] Parameterised queries

Ian Gow |@ndgow @end|ng |rom gm@||@com
Thu Feb 12 15:20:44 CET 2015


Is there merit in using PostgreSQL’s arrays to send data from R to PostgreSQL? (Caveat: I say this having no idea how PL/R gets vectors from R to PostgreSQL.)

PL/R is able to pull vectors in from R, though RPostgreSQL cannot pulls arrays from PostgreSQL as vectors (though it would be great to have this feature).

The second and third approaches below do not differ in performance.

# Getting data one vector at a time

DROP FUNCTION IF EXISTS get_int_vector();

CREATE OR REPLACE FUNCTION get_int_vector()
  RETURNS integer[] AS
$BODY$
     sample(1:10)
$BODY$ LANGUAGE plr;

DROP FUNCTION IF EXISTS get_str_vector();

CREATE OR REPLACE FUNCTION get_str_vector()
  RETURNS text[] AS
$BODY$
     sample(letters[1:10])
$BODY$ LANGUAGE plr;

DROP FUNCTION IF EXISTS get_date_vector();

— For some reason, dates get converted to integers.
CREATE OR REPLACE FUNCTION get_date_vector()
  RETURNS date[] AS
$BODY$
     as.character(seq(from=as.Date("1971-01-01"), by=1, length.out = 10))
$BODY$ LANGUAGE plr;


WITH raw_data AS (
    SELECT get_int_vector() AS ints, 
        get_str_vector() AS strs,
        get_date_vector() AS dates)
SELECT 
    UNNEST(ints) AS int,
    UNNEST(strs) AS str,
    UNNEST(dates) AS date
FROM raw_data;

## Via RPostgreSQL

system.time({
    n <- 1e6
    df <- data.frame(int=sample(1:10, size=n, replace=TRUE),
                     str=sample(letters[1:10], size=n, replace=TRUE),
                     date=as.character(sample(seq(from=as.Date("1971-01-01"), 
                                                  by=1, length.out = 10), 
                                              size=n, replace=TRUE)),
                     stringsAsFactors=FALSE)
    
    library("RPostgreSQL")
    
    pg <- dbConnect(PostgreSQL())
    
    dbWriteTable(pg, "rpostgresql_test", df, row.names=FALSE, overwrite=TRUE)
})

## Using composite type created with table

DROP TABLE IF EXISTS plr_test CASCADE;

CREATE TABLE plr_test (int integer, str text, date date);

CREATE OR REPLACE FUNCTION get_df()
  RETURNS SETOF plr_test AS
$BODY$
    n <- 1e6
    df <- data.frame(int=sample(1:10, size=n, replace=TRUE),
                    str=sample(letters[1:10], size=n, replace=TRUE),
                    date=as.character(sample(seq(from=as.Date("1971-01-01"), 
                                                by=1, length.out = 10), 
                                             size=n, replace=TRUE)),
                    stringsAsFactors=FALSE)
    return(df)
$BODY$ LANGUAGE plr;

INSERT INTO plr_test
SELECT * FROM get_df();


> On Feb 12, 2015, at 6:00 AM, r-sig-db-request using r-project.org wrote:
> 
> Send R-sig-DB mailing list submissions to
> 	r-sig-db using r-project.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://stat.ethz.ch/mailman/listinfo/r-sig-db
> or, via email, send a message with subject or body 'help' to
> 	r-sig-db-request using r-project.org
> 
> You can reach the person managing the list at
> 	r-sig-db-owner using r-project.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-sig-DB digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Parameterised queries (Tim Keitt)
>   2. Re: Parameterised queries (Hadley Wickham)
>   3. Re: Parameterised queries (Tim Keitt)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 11 Feb 2015 15:05:08 -0600
> From: Tim Keitt <tkeitt using utexas.edu>
> To: Hadley Wickham <h.wickham using gmail.com>
> Cc: "r-sig-db using r-project.org" <r-sig-db using r-project.org>
> Subject: Re: [R-sig-DB] Parameterised queries
> Message-ID:
> 	<CANnL8gr1MCX-LFjFHWyoQSg-DD6Tetbp0iBGx0=Ht2LVH1Vp4A using mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> On Wed, Feb 11, 2015 at 2:41 PM, Hadley Wickham <h.wickham using gmail.com> wrote:
> 
>>>> It gives a new attack vector - to introduce additional data into the
>>>> database, you just need to figure out how to turn a length 1 vector in
>>>> to a length 2 vector.
>>>> 
>>>> It's dangerous in the same way that allowing dbGetQuery() to execute
>>>> multiple queries is dangerous.
>>> 
>>> I'd rather hope that if it were a case that mattered, the user would not
>>> rely on the api as a substitute for appropriate checks.
>> 
>> I think the API should be as safe as possible by default, and
>> sacrificing safety for speed should only be done explicitly when the
>> user asks for it.
>> 
> 
> My use cases are not so sensitive, but I agree with the general idea. Also,
> you really do not gain much over regular looping as inserts are really
> slow, at least in postgresql.
> 
> THK
> 
> 
>> 
>> Hadley
>> 
>> --
>> http://had.co.nz/
>> 
> 
> 
> 
> -- 
> http://www.keittlab.org/
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 11 Feb 2015 15:07:45 -0600
> From: Hadley Wickham <h.wickham using gmail.com>
> To: Tim Keitt <tkeitt using utexas.edu>
> Cc: "r-sig-db using r-project.org" <r-sig-db using r-project.org>
> Subject: Re: [R-sig-DB] Parameterised queries
> Message-ID:
> 	<CABdHhvE8nJ-0gs=cTxCQNPifdrnBjy-ZAo1d6aWLr7W0Uw8Ddg using mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
>>> I think the API should be as safe as possible by default, and
>>> sacrificing safety for speed should only be done explicitly when the
>>> user asks for it.
>> 
>> My use cases are not so sensitive, but I agree with the general idea. Also,
>> you really do not gain much over regular looping as inserts are really slow,
>> at least in postgresql.
> 
> Yes, I'm just working on that for RPostgres. Parameterised inserts are
> actually slower than sending one giant SQL string. RPostgreSQL uses
> COPY ... FROM STDIN (and manually converts the data frame into a
> single c string) for better performance
> 
> Hadley
> 
> -- 
> http://had.co.nz/
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 11 Feb 2015 15:11:23 -0600
> From: Tim Keitt <tkeitt using utexas.edu>
> To: Hadley Wickham <h.wickham using gmail.com>
> Cc: "r-sig-db using r-project.org" <r-sig-db using r-project.org>
> Subject: Re: [R-sig-DB] Parameterised queries
> Message-ID:
> 	<CANnL8gpVrM-_HoZMZJi2d2ARAfpEv97uV9JjKxwNpTDaGfDY2Q using mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> On Wed, Feb 11, 2015 at 3:07 PM, Hadley Wickham <h.wickham using gmail.com> wrote:
> 
>>>> I think the API should be as safe as possible by default, and
>>>> sacrificing safety for speed should only be done explicitly when the
>>>> user asks for it.
>>> 
>>> My use cases are not so sensitive, but I agree with the general idea.
>> Also,
>>> you really do not gain much over regular looping as inserts are really
>> slow,
>>> at least in postgresql.
>> 
>> Yes, I'm just working on that for RPostgres. Parameterised inserts are
>> actually slower than sending one giant SQL string. RPostgreSQL uses
>> COPY ... FROM STDIN (and manually converts the data frame into a
>> single c string) for better performance
>> 
> 
> I put copy_from and copy_to in rpg. I punted on the C interface and simply
> shell out to psql in that case. Works fine with read.csv. and write.csv.
> 
> THK
> 
> 
>> 
>> Hadley
>> 
>> --
>> http://had.co.nz/
>> 
>> _______________________________________________
>> R-sig-DB mailing list -- R Special Interest Group
>> R-sig-DB using r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>> 
> 
> 
> 
> -- 
> http://www.keittlab.org/
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> R-sig-DB mailing list
> R-sig-DB using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
> 
> 
> ------------------------------
> 
> End of R-sig-DB Digest, Vol 119, Issue 5
> ****************************************




More information about the R-sig-DB mailing list