[R-sig-DB] R and PostgreSQL - Writing data?

NISHIYAMA Tomoaki tomo@k|n @end|ng |rom @t@||@k@n@z@w@-u@@c@jp
Wed Apr 17 02:13:01 CEST 2013


Dear Kevin,

>    The problem I'm trying to solve right now is being able to efficiently load 70 million chemical compounds into postgres. I know there are other avenues for accomplishing this, but using R is the best solution in this case. 

dbWriteTable() should be used to load all rows of a data frame to PostgreSQL.
This uses a single COPY and should be much faster than calling PQexecPrepared many times.

For prepared statement in RPostgreSQL, I think we should implement some mechanism to
access the prepared statement from R and make use of it by dbGetQuery or dbApply?
functions.

Best regards,
-- 
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi, 
Kanazawa, 920-0934, Japan


On 2013/04/17, at 8:19, horank01 using ucr.edu wrote:

> 
> Hi, I would be interested in implementing what ever is required to support prepared queries. I was thinking of allowing dbSendQuery take a data frame instead of a vector, and then prepare the query once and run it on all rows of the data frame. This is basically what RSQLite does. I have already made a quick modification to RS_PostgreSQL_pqexecParams to call PQexecPrepared instead on an already prepared statement, and that worked. So it seems its mostly a case of modifying the C code to prepare the query first and then read through the data frame calling PQexecPrepared. 
>    The problem I'm trying to solve right now is being able to efficiently load 70 million chemical compounds into postgres. I know there are other avenues for accomplishing this, but using R is the best solution in this case. 
>    Please let me know how I can best help, how you want things done, etc. Thanks. 
> 
> Kevin
> 
> On Thursday, December 6, 2012 6:57:22 AM UTC-8, Tomoaki wrote:
> Hi, 
> 
> PostgreSQL have library function PQexecParams and also supports prepared statements. 
> String expansion in the SQL statement is cumbersome for escaping special characters and 
> therefore error prone. 
> 
> I just commited to the SVN repository a very simple and primitive implementation that 
> allows to pass vector of characters as parameters. 
> 
> A sample statement is like: 
> 
>     res <- dbGetQuery(con, "SELECT * FROM rockdata WHERE peri > $1 AND shape < $2 LIMIT $3", c(4000, 0.2, 10)) 
>     print(res) 
> 
> The syntax for a positional parameter is a dollar sign ($) followed by digits 
> rather than a colon followed by digits in PostgreSQL. 
> http://www.postgresql.org/docs/9.2/static/sql-syntax-lexical.html#SQL-SYNTAX-SPECIAL-CHARS 
> 
> This mechanism is required for the support of prepared statements. 
> It is nicer if I could make automatic conversions for various type and binary transfer, 
> but this is not implemented right now. 
> So all parameters are simply passed as strings at the moment. 
> 
> Note this is the very initial implementation and the interface may change. 
> 
> Any enhancement, feedback, or test case/program is welcome. 
> Especially, on what would be the best interface/syntax. 
> 
> Best regards, 
> -- 
> Tomoaki NISHIYAMA 
> 
> Advanced Science Research Center, 
> Kanazawa University, 
> 13-1 Takara-machi, 
> Kanazawa, 920-0934, Japan 
> 
> 
> On 2012/12/01, at 0:26, Dirk Eddelbuettel wrote: 
> 
> > 
> > On 30 November 2012 at 15:05, James David Smith wrote: 
> > | Hi all, 
> > | 
> > | Sorry for the thread re-activation. I was wondering if anyone has 
> > | successfully used the syntax below with the library RPostgreSQL? 
> > 
> > Nope. 
> > 
> > I always expand the strings explicitly. It would be news to me of that 
> > worked.  Good news, for sure, but still news... 
> > 
> > Dirk 
> > 
> > 
> > | dbGetQuery(con, "update foo set sal = :1 where empno = :2", 
> > |            data = dat[,c("SAL","EMPNO")]) 
> > | 
> > | I've been messing about with it but can't get it to work. I get the error: 
> > | 
> > | Error in postgresqlQuickSQL(conn, statement, ...) : 
> > |   unused argument(s) (data = list(bc = c(NA, NA, NA etc. 
> > | 
> > | Thanks 
> > | 
> > | James 
> > | 
> > | 
> > | 
> > | On 28 September 2012 17:13, Denis Mukhin <denis.x... using oracle.com> wrote: 
> > | > James, 
> > | > 
> > | > I have never tried RPostgreSQL before but in ROracle which is also a DBI based interface you can do something like this: 
> > | > 
> > | > library(ROracle) 
> > | > con <- dbConnect(Oracle(), "scott", "tiger") 
> > | > dbGetQuery(con, "create table foo as select * from emp") 
> > | > 
> > | > dat <- dbGetQuery(con, "select * from foo") 
> > | > dat$SAL <- dat$SAL*10 
> > | > dbGetQuery(con, "update foo set sal = :1 where empno = :2", 
> > | >            data = dat[,c("SAL","EMPNO")]) 
> > | > dbCommit(con) 
> > | > dbGetQuery(con, "select * from foo") 
> > | > 
> > | > dbGetQuery(con, "drop table foo purge") 
> > | > dbDisconnect(con) 
> > | > 
> > | > Denis 
> > | > 
> > | > -----Original Message----- 
> > | > From: Sean Davis [mailto:sda... using mail.nih.gov] 
> > | > Sent: Friday, September 28, 2012 11:43 AM 
> > | > To: James David Smith 
> > | > Cc: r-si... using r-project.org 
> > | > Subject: Re: [R-sig-DB] R and PostgreSQL - Writing data? 
> > | > 
> > | > On Fri, Sep 28, 2012 at 10:36 AM, James David Smith <james.da... using gmail.com> wrote: 
> > | >> Hi Sean, 
> > | >> 
> > | >> Thanks for the reply. I'm familiar with UPDATE queries when working in 
> > | >> PostgreSQL, but not from within R. Would it look something like this? 
> > | >> 
> > | >> dbWriteTable(con, " UPDATE table SET ucam_no2 = 
> > | >> 'ucam_no2$interpolated_data' ") 
> > | >> 
> > | >> My problem is how to get the R data 'within' my SQL statement I think. 
> > | > 
> > | > To do an update, you'll need to loop through your data.frame and issue a dbSendQuery().  To create the SQL string, I often use something 
> > | > like: 
> > | > 
> > | > sprintf("UPDATE originalTable SET ucam_no2=%f WHERE originalTable.id = %d",....) 
> > | > 
> > | > You can't do this in one step, unfortunately.  This is how UPDATE works and has nothing to do with R. 
> > | > 
> > | > Sean 
> > | > 
> > | > 
> > | >> 
> > | >> On 28 September 2012 15:19, Sean Davis <sda... using mail.nih.gov> wrote: 
> > | >>> On Fri, Sep 28, 2012 at 10:14 AM, James David Smith 
> > | >>> <james.da... using gmail.com> wrote: 
> > | >>>> Dear all, 
> > | >>>> 
> > | >>>> Sorry if this isn't quite the right place, but it's the first time 
> > | >> SendSave NowDiscardDraft autosaved at 15:36 (0 minutes ago) 33% full 
> > | >> Using 3.4 GB of your 10.1 GB 
> > | >> ��2012 Google - Terms & Privacy 
> > | >> Last account activity: 50 minutes ago 
> > | >> Details 
> > | >> People (2) 
> > | >> Sean Davis 
> > | >> Add to circles 
> > | >> 
> > | >> Show details 
> > | >> Ads ��� Why these ads? 
> > | >> Big Data Too Slow? 
> > | >> Real-Time Analytics for Big Data. Visual Drag & Drop UI. Quick & Easy 
> > | >> PentahoBigData.com Talend Open Source ESB Open Source ESB Based on 
> > | >> Apache CXF and Apache Camel. Free Download! 
> > | >> www.talend.com/Free_ESB_Software 
> > | >> Warp I/O for SQL Server 
> > | >> Speed SQL Server performance 3x Faster I/O, reduced storage 
> > | >> www.confio.com/warp-io Storage Container Sussex Ex-Shipping Containers 
> > | >> Sale & Hire Storage Container 0800 043 6311 
> > | >> www.CsShippingContainers.co.uk More about... 
> > | >> MS Access Database SQL �� 
> > | >> Database �� 
> > | >> Excel Database Query �� 
> > | >> Oracle Database Problems �� 
> > | >> 
> > | >>>> I've posted here. My issue is to do with writing to a PostgreSQL 
> > | >>>> database from within R. My situation is best explained by some R 
> > | >>>> code to start: 
> > | >>>> 
> > | >>>> #Connect to the database 
> > | >>>> con <- dbConnect(PostgreSQL(), user="postgres", password="password", 
> > | >>>> dbname="database") 
> > | >>>> 
> > | >>>> #Get some data out of the database. 
> > | >>>> ucam_no2$original_data <- dbGetQuery(con, "select ucam_no2 FROM 
> > | >>>> table") 
> > | >>>> 
> > | >>>> This returns say 10000 rows of data, but there is only data in about 
> > | >>>> half of those rows. What I want to do is interpolate the missing 
> > | >>>> data so I do this: 
> > | >>>> 
> > | >>>> #Generate some data 
> > | >>>> ucam_no2$interpolated_data <- na.approx(ucam_data$ucam_no2, na.rm = 
> > | >>>> FALSE) 
> > | >>>> 
> > | >>>> This works well and I now have 10000 rows of data with no empty cells. 
> > | >>>> I now want to write this back into my PostgresSQL database. Into the 
> > | >>>> same row that I took the data from in the first place. But I don't 
> > | >>>> know how. I can write to a new table with something like the below, 
> > | >>>> but what I'd really like to do is put the data back into the table I 
> > | >>>> got it from. 
> > | >>>> 
> > | >>>> # Try to write the data back 
> > | >>>> dbWriteTable(con, "new_data", ucam_no2$interpolated_data) 
> > | >>> 
> > | >>> Hi, James. 
> > | >>> 
> > | >>> You'll need to look into doing a SQL UPDATE.  That is the standard 
> > | >>> way to "put data back into the table I got it from". 
> > | >>> 
> > | >>> Sean 
> > | > 
> > | > _______________________________________________ 
> > | > R-sig-DB mailing list -- R Special Interest Group R-si... using r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-db 
> > | > 
> > | > _______________________________________________ 
> > | > R-sig-DB mailing list -- R Special Interest Group 
> > | > R-si... using r-project.org 
> > | > https://stat.ethz.ch/mailman/listinfo/r-sig-db 
> > | 
> > | _______________________________________________ 
> > | R-sig-DB mailing list -- R Special Interest Group 
> > | R-si... using r-project.org 
> > | https://stat.ethz.ch/mailman/listinfo/r-sig-db 
> > 
> > -- 
> > Dirk Eddelbuettel | e... using debian.org | http://dirk.eddelbuettel.com   
> > 
> > _______________________________________________ 
> > R-sig-DB mailing list -- R Special Interest Group 
> > R-si... using r-project.org 
> > https://stat.ethz.ch/mailman/listinfo/r-sig-db 
> 


	[[alternative HTML version deleted]]




More information about the R-sig-DB mailing list