[R-sig-DB] R and PostgreSQL - Writing data?
NISHIYAMA Tomoaki
tomo@k|n @end|ng |rom @t@||@k@n@z@w@-u@@c@jp
Wed Apr 17 02:13:01 CEST 2013
Dear Kevin,
> The problem I'm trying to solve right now is being able to efficiently load 70 million chemical compounds into postgres. I know there are other avenues for accomplishing this, but using R is the best solution in this case.
dbWriteTable() should be used to load all rows of a data frame to PostgreSQL.
This uses a single COPY and should be much faster than calling PQexecPrepared many times.
For prepared statement in RPostgreSQL, I think we should implement some mechanism to
access the prepared statement from R and make use of it by dbGetQuery or dbApply?
functions.
Best regards,
--
Tomoaki NISHIYAMA
Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan
On 2013/04/17, at 8:19, horank01 using ucr.edu wrote:
>
> Hi, I would be interested in implementing what ever is required to support prepared queries. I was thinking of allowing dbSendQuery take a data frame instead of a vector, and then prepare the query once and run it on all rows of the data frame. This is basically what RSQLite does. I have already made a quick modification to RS_PostgreSQL_pqexecParams to call PQexecPrepared instead on an already prepared statement, and that worked. So it seems its mostly a case of modifying the C code to prepare the query first and then read through the data frame calling PQexecPrepared.
> The problem I'm trying to solve right now is being able to efficiently load 70 million chemical compounds into postgres. I know there are other avenues for accomplishing this, but using R is the best solution in this case.
> Please let me know how I can best help, how you want things done, etc. Thanks.
>
> Kevin
>
> On Thursday, December 6, 2012 6:57:22 AM UTC-8, Tomoaki wrote:
> Hi,
>
> PostgreSQL have library function PQexecParams and also supports prepared statements.
> String expansion in the SQL statement is cumbersome for escaping special characters and
> therefore error prone.
>
> I just commited to the SVN repository a very simple and primitive implementation that
> allows to pass vector of characters as parameters.
>
> A sample statement is like:
>
> res <- dbGetQuery(con, "SELECT * FROM rockdata WHERE peri > $1 AND shape < $2 LIMIT $3", c(4000, 0.2, 10))
> print(res)
>
> The syntax for a positional parameter is a dollar sign ($) followed by digits
> rather than a colon followed by digits in PostgreSQL.
> http://www.postgresql.org/docs/9.2/static/sql-syntax-lexical.html#SQL-SYNTAX-SPECIAL-CHARS
>
> This mechanism is required for the support of prepared statements.
> It is nicer if I could make automatic conversions for various type and binary transfer,
> but this is not implemented right now.
> So all parameters are simply passed as strings at the moment.
>
> Note this is the very initial implementation and the interface may change.
>
> Any enhancement, feedback, or test case/program is welcome.
> Especially, on what would be the best interface/syntax.
>
> Best regards,
> --
> Tomoaki NISHIYAMA
>
> Advanced Science Research Center,
> Kanazawa University,
> 13-1 Takara-machi,
> Kanazawa, 920-0934, Japan
>
>
> On 2012/12/01, at 0:26, Dirk Eddelbuettel wrote:
>
> >
> > On 30 November 2012 at 15:05, James David Smith wrote:
> > | Hi all,
> > |
> > | Sorry for the thread re-activation. I was wondering if anyone has
> > | successfully used the syntax below with the library RPostgreSQL?
> >
> > Nope.
> >
> > I always expand the strings explicitly. It would be news to me of that
> > worked. Good news, for sure, but still news...
> >
> > Dirk
> >
> >
> > | dbGetQuery(con, "update foo set sal = :1 where empno = :2",
> > | data = dat[,c("SAL","EMPNO")])
> > |
> > | I've been messing about with it but can't get it to work. I get the error:
> > |
> > | Error in postgresqlQuickSQL(conn, statement, ...) :
> > | unused argument(s) (data = list(bc = c(NA, NA, NA etc.
> > |
> > | Thanks
> > |
> > | James
> > |
> > |
> > |
> > | On 28 September 2012 17:13, Denis Mukhin <denis.x... using oracle.com> wrote:
> > | > James,
> > | >
> > | > I have never tried RPostgreSQL before but in ROracle which is also a DBI based interface you can do something like this:
> > | >
> > | > library(ROracle)
> > | > con <- dbConnect(Oracle(), "scott", "tiger")
> > | > dbGetQuery(con, "create table foo as select * from emp")
> > | >
> > | > dat <- dbGetQuery(con, "select * from foo")
> > | > dat$SAL <- dat$SAL*10
> > | > dbGetQuery(con, "update foo set sal = :1 where empno = :2",
> > | > data = dat[,c("SAL","EMPNO")])
> > | > dbCommit(con)
> > | > dbGetQuery(con, "select * from foo")
> > | >
> > | > dbGetQuery(con, "drop table foo purge")
> > | > dbDisconnect(con)
> > | >
> > | > Denis
> > | >
> > | > -----Original Message-----
> > | > From: Sean Davis [mailto:sda... using mail.nih.gov]
> > | > Sent: Friday, September 28, 2012 11:43 AM
> > | > To: James David Smith
> > | > Cc: r-si... using r-project.org
> > | > Subject: Re: [R-sig-DB] R and PostgreSQL - Writing data?
> > | >
> > | > On Fri, Sep 28, 2012 at 10:36 AM, James David Smith <james.da... using gmail.com> wrote:
> > | >> Hi Sean,
> > | >>
> > | >> Thanks for the reply. I'm familiar with UPDATE queries when working in
> > | >> PostgreSQL, but not from within R. Would it look something like this?
> > | >>
> > | >> dbWriteTable(con, " UPDATE table SET ucam_no2 =
> > | >> 'ucam_no2$interpolated_data' ")
> > | >>
> > | >> My problem is how to get the R data 'within' my SQL statement I think.
> > | >
> > | > To do an update, you'll need to loop through your data.frame and issue a dbSendQuery(). To create the SQL string, I often use something
> > | > like:
> > | >
> > | > sprintf("UPDATE originalTable SET ucam_no2=%f WHERE originalTable.id = %d",....)
> > | >
> > | > You can't do this in one step, unfortunately. This is how UPDATE works and has nothing to do with R.
> > | >
> > | > Sean
> > | >
> > | >
> > | >>
> > | >> On 28 September 2012 15:19, Sean Davis <sda... using mail.nih.gov> wrote:
> > | >>> On Fri, Sep 28, 2012 at 10:14 AM, James David Smith
> > | >>> <james.da... using gmail.com> wrote:
> > | >>>> Dear all,
> > | >>>>
> > | >>>> Sorry if this isn't quite the right place, but it's the first time
> > | >> SendSave NowDiscardDraft autosaved at 15:36 (0 minutes ago) 33% full
> > | >> Using 3.4 GB of your 10.1 GB
> > | >> ��2012 Google - Terms & Privacy
> > | >> Last account activity: 50 minutes ago
> > | >> Details
> > | >> People (2)
> > | >> Sean Davis
> > | >> Add to circles
> > | >>
> > | >> Show details
> > | >> Ads ��� Why these ads?
> > | >> Big Data Too Slow?
> > | >> Real-Time Analytics for Big Data. Visual Drag & Drop UI. Quick & Easy
> > | >> PentahoBigData.com Talend Open Source ESB Open Source ESB Based on
> > | >> Apache CXF and Apache Camel. Free Download!
> > | >> www.talend.com/Free_ESB_Software
> > | >> Warp I/O for SQL Server
> > | >> Speed SQL Server performance 3x Faster I/O, reduced storage
> > | >> www.confio.com/warp-io Storage Container Sussex Ex-Shipping Containers
> > | >> Sale & Hire Storage Container 0800 043 6311
> > | >> www.CsShippingContainers.co.uk More about...
> > | >> MS Access Database SQL ��
> > | >> Database ��
> > | >> Excel Database Query ��
> > | >> Oracle Database Problems ��
> > | >>
> > | >>>> I've posted here. My issue is to do with writing to a PostgreSQL
> > | >>>> database from within R. My situation is best explained by some R
> > | >>>> code to start:
> > | >>>>
> > | >>>> #Connect to the database
> > | >>>> con <- dbConnect(PostgreSQL(), user="postgres", password="password",
> > | >>>> dbname="database")
> > | >>>>
> > | >>>> #Get some data out of the database.
> > | >>>> ucam_no2$original_data <- dbGetQuery(con, "select ucam_no2 FROM
> > | >>>> table")
> > | >>>>
> > | >>>> This returns say 10000 rows of data, but there is only data in about
> > | >>>> half of those rows. What I want to do is interpolate the missing
> > | >>>> data so I do this:
> > | >>>>
> > | >>>> #Generate some data
> > | >>>> ucam_no2$interpolated_data <- na.approx(ucam_data$ucam_no2, na.rm =
> > | >>>> FALSE)
> > | >>>>
> > | >>>> This works well and I now have 10000 rows of data with no empty cells.
> > | >>>> I now want to write this back into my PostgresSQL database. Into the
> > | >>>> same row that I took the data from in the first place. But I don't
> > | >>>> know how. I can write to a new table with something like the below,
> > | >>>> but what I'd really like to do is put the data back into the table I
> > | >>>> got it from.
> > | >>>>
> > | >>>> # Try to write the data back
> > | >>>> dbWriteTable(con, "new_data", ucam_no2$interpolated_data)
> > | >>>
> > | >>> Hi, James.
> > | >>>
> > | >>> You'll need to look into doing a SQL UPDATE. That is the standard
> > | >>> way to "put data back into the table I got it from".
> > | >>>
> > | >>> Sean
> > | >
> > | > _______________________________________________
> > | > R-sig-DB mailing list -- R Special Interest Group R-si... using r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-db
> > | >
> > | > _______________________________________________
> > | > R-sig-DB mailing list -- R Special Interest Group
> > | > R-si... using r-project.org
> > | > https://stat.ethz.ch/mailman/listinfo/r-sig-db
> > |
> > | _______________________________________________
> > | R-sig-DB mailing list -- R Special Interest Group
> > | R-si... using r-project.org
> > | https://stat.ethz.ch/mailman/listinfo/r-sig-db
> >
> > --
> > Dirk Eddelbuettel | e... using debian.org | http://dirk.eddelbuettel.com
> >
> > _______________________________________________
> > R-sig-DB mailing list -- R Special Interest Group
> > R-si... using r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
[[alternative HTML version deleted]]
More information about the R-sig-DB
mailing list