[R-sig-DB] R and PostgreSQL - Writing data?

NISHIYAMA Tomoaki tomo@k|n @end|ng |rom @t@||@k@n@z@w@-u@@c@jp
Thu Apr 18 02:38:35 CEST 2013


Dear Kevin,

> The problem I have though is that I need to leave the primary key
> field un-specified so that it will fill in the key from a sequence.


I do not understand why you need to leave the primary key unspecified.
You can change the field characteristics with ALTER TABLE, ALTER SEQUENCE,
and so on.  To see what operation is needed you can prepare a small
example table and use pg_dump.

"row.names" are copied by dbWriteTable
to ensure that every record is distinguishable.
It is by default a text field.
If you need a serial primary key, you may create a new column, and compute
the initial values and add constraints and index. 
(or just alter the column may work depending on the data.frame)

Note that pg_dump/restore should have been very well tested by 
PostgreSQL developers to ensure the reliability and the speed.
So, it would be a very hard to invent a better method.

>   So, right now it executes "COPY tablename FROM", but could it be changed to grab
> the list of fields from the given data frame and then add them to the copy command,
> like so: "COPY tablename (col1, col2, ... ) FROM"?


What would be the calling convention of dbWriteTable, then?
Current one is very simple, that is,
dbWrtieTable(con, tablename, data.frame)

If you want to write the list of columns, then wouldn't 
it easier to make a data.frame that have only those columns?
-- 
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi, 
Kanazawa, 920-0934, Japan


On 2013/04/18, at 2:50, khoran using globalrecordings.net wrote:

> Tomoaki,
>    dbWriteTable could work, I'd not looked too closely at it before. The problem I have though is that I need to leave the primary key field un-specified so that it will fill in the key from a sequence. Using a column of NA values does not work. This could be fixed by having dbWriteTable explicitly state the list of columns given in the data frame in the COPY command.  So, right now it executes "COPY tablename FROM", but could it be changed to grab the list of fields from the given data frame and then add them to the copy command, like so: "COPY tablename (col1, col2, ... ) FROM"? Then I would not need any prepared statements though ....
> 
> Thanks
> 
> Kevin
> 
> On Tuesday, April 16, 2013 5:13:01 PM UTC-7, Tomoaki wrote:
> Dear Kevin,
> 
>>    The problem I'm trying to solve right now is being able to efficiently load 70 million chemical compounds into postgres. I know there are other avenues for accomplishing this, but using R is the best solution in this case. 
> 
> dbWriteTable() should be used to load all rows of a data frame to PostgreSQL.
> This uses a single COPY and should be much faster than calling PQexecPrepared many times.
> 
> For prepared statement in RPostgreSQL, I think we should implement some mechanism to
> access the prepared statement from R and make use of it by dbGetQuery or dbApply?
> functions.
> 
> Best regards,
> -- 
> Tomoaki NISHIYAMA
> 
> Advanced Science Research Center,
> Kanazawa University,
> 13-1 Takara-machi, 
> Kanazawa, 920-0934, Japan
> 
> 
> On 2013/04/17, at 8:19, hora... using ucr.edu wrote:
> 
>> 
>> Hi, I would be interested in implementing what ever is required to support prepared queries. I was thinking of allowing dbSendQuery take a data frame instead of a vector, and then prepare the query once and run it on all rows of the data frame. This is basically what RSQLite does. I have already made a quick modification to RS_PostgreSQL_pqexecParams to call PQexecPrepared instead on an already prepared statement, and that worked. So it seems its mostly a case of modifying the C code to prepare the query first and then read through the data frame calling PQexecPrepared. 
>>    The problem I'm trying to solve right now is being able to efficiently load 70 million chemical compounds into postgres. I know there are other avenues for accomplishing this, but using R is the best solution in this case. 
>>    Please let me know how I can best help, how you want things done, etc. Thanks. 
>> 
>> Kevin
>> 
>> On Thursday, December 6, 2012 6:57:22 AM UTC-8, Tomoaki wrote:
>> Hi, 
>> 
>> PostgreSQL have library function PQexecParams and also supports prepared statements. 
>> String expansion in the SQL statement is cumbersome for escaping special characters and 
>> therefore error prone. 
>> 
>> I just commited to the SVN repository a very simple and primitive implementation that 
>> allows to pass vector of characters as parameters. 
>> 
>> A sample statement is like: 
>> 
>>     res <- dbGetQuery(con, "SELECT * FROM rockdata WHERE peri > $1 AND shape < $2 LIMIT $3", c(4000, 0.2, 10)) 
>>     print(res) 
>> 
>> The syntax for a positional parameter is a dollar sign ($) followed by digits 
>> rather than a colon followed by digits in PostgreSQL. 
>> http://www.postgresql.org/docs/9.2/static/sql-syntax-lexical.html#SQL-SYNTAX-SPECIAL-CHARS 
>> 
>> This mechanism is required for the support of prepared statements. 
>> It is nicer if I could make automatic conversions for various type and binary transfer, 
>> but this is not implemented right now. 
>> So all parameters are simply passed as strings at the moment. 
>> 
>> Note this is the very initial implementation and the interface may change. 
>> 
>> Any enhancement, feedback, or test case/program is welcome. 
>> Especially, on what would be the best interface/syntax. 
>> 
>> Best regards, 
>> -- 
>> Tomoaki NISHIYAMA 
>> 
>> Advanced Science Research Center, 
>> Kanazawa University, 
>> 13-1 Takara-machi, 
>> Kanazawa, 920-0934, Japan 
>> 
>> 
>> On 2012/12/01, at 0:26, Dirk Eddelbuettel wrote: 
>> 
>> > 
>> > On 30 November 2012 at 15:05, James David Smith wrote: 
>> > | Hi all, 
>> > | 
>> > | Sorry for the thread re-activation. I was wondering if anyone has 
>> > | successfully used the syntax below with the library RPostgreSQL? 
>> > 
>> > Nope. 
>> > 
>> > I always expand the strings explicitly. It would be news to me of that 
>> > worked.  Good news, for sure, but still news... 
>> > 
>> > Dirk 
>> > 
>> > 
>> > | dbGetQuery(con, "update foo set sal = :1 where empno = :2", 
>> > |            data = dat[,c("SAL","EMPNO")]) 
>> > | 
>> > | I've been messing about with it but can't get it to work. I get the error: 
>> > | 
>> > | Error in postgresqlQuickSQL(conn, statement, ...) : 
>> > |   unused argument(s) (data = list(bc = c(NA, NA, NA etc. 
>> > | 
>> > | Thanks 
>> > | 
>> > | James 
>> > | 
>> > | 
>> > | 
>> > | On 28 September 2012 17:13, Denis Mukhin <denis.x... using oracle.com> wrote: 
>> > | > James, 
>> > | > 
>> > | > I have never tried RPostgreSQL before but in ROracle which is also a DBI based interface you can do something like this: 
>> > | > 
>> > | > library(ROracle) 
>> > | > con <- dbConnect(Oracle(), "scott", "tiger") 
>> > | > dbGetQuery(con, "create table foo as select * from emp") 
>> > | > 
>> > | > dat <- dbGetQuery(con, "select * from foo") 
>> > | > dat$SAL <- dat$SAL*10 
>> > | > dbGetQuery(con, "update foo set sal = :1 where empno = :2", 
>> > | >            data = dat[,c("SAL","EMPNO")]) 
>> > | > dbCommit(con) 
>> > | > dbGetQuery(con, "select * from foo") 
>> > | > 
>> > | > dbGetQuery(con, "drop table foo purge") 
>> > | > dbDisconnect(con) 
>> > | > 
>> > | > Denis 
>> > | > 
>> > | > -----Original Message----- 
>> > | > From: Sean Davis [mailto:sda... using mail.nih.gov] 
>> > | > Sent: Friday, September 28, 2012 11:43 AM 
>> > | > To: James David Smith 
>> > | > Cc: r-si... using r-project.org 
>> > | > Subject: Re: [R-sig-DB] R and PostgreSQL - Writing data? 
>> > | > 
>> > | > On Fri, Sep 28, 2012 at 10:36 AM, James David Smith <james.da... using gmail.com> wrote: 
>> > | >> Hi Sean, 
>> > | >> 
>> > | >> Thanks for the reply. I'm familiar with UPDATE queries when working in 
>> > | >> PostgreSQL, but not from within R. Would it look something like this? 
>> > | >> 
>> > | >> dbWriteTable(con, " UPDATE table SET ucam_no2 = 
>> > | >> 'ucam_no2$interpolated_data' ") 
>> > | >> 
>> > | >> My problem is how to get the R data 'within' my SQL statement I think. 
>> > | > 
>> > | > To do an update, you'll need to loop through your data.frame and issue a dbSendQuery().  To create the SQL string, I often use something 
>> > | > like: 
>> > | > 
>> > | > sprintf("UPDATE originalTable SET ucam_no2=%f WHERE originalTable.id = %d",....) 
>> > | > 
>> > | > You can't do this in one step, unfortunately.  This is how UPDATE works and has nothing to do with R. 
>> > | > 
>> > | > Sean 
>> > | > 
>> > | > 
>> > | >> 
>> > | >> On 28 September 2012 15:19, Sean Davis <sda... using mail.nih.gov> wrote: 
>> > | >>> On Fri, Sep 28, 2012 at 10:14 AM, James David Smith 
>> > | >>> <james.da... using gmail.com> wrote: 
>> > | >>>> Dear all, 
>> > | >>>> 
>> > | >>>> Sorry if this isn't quite the right place, but it's the first time 
>> > | >> SendSave NowDiscardDraft autosaved at 15:36 (0 minutes ago) 33% full 
>> > | >> Using 3.4 GB of your 10.1 GB 
>> > | >> �2012 Google - Terms & Privacy 
>> > | >> Last account activity: 50 minutes ago 
>> > | >> Details 
>> > | >> People (2) 
>> > | >> Sean Davis 
>> > | >> Add to circles 
>> > | >> 
>> > | >> Show details 
>> > | >> Ads � Why these ads? 
>> > | >> Big Data Too Slow? 
>> > | >> Real-Time Analytics for Big Data. Visual Drag & Drop UI. Quick & Easy 
>> > | >> PentahoBigData.com Talend Open Source ESB Open Source ESB Based on 
>> > | >> Apache CXF and Apache Camel. Free Download! 
>> > | >> www.talend.com/Free_ESB_Software 
>> > | >> Warp I/O for SQL Server 
>> > | >> Speed SQL Server performance 3x Faster I/O, reduced storage 
>> > | >> www.confio.com/warp-io Storage Container Sussex Ex-Shipping Containers 
>> > | >> Sale & Hire Storage Container 0800 043 6311 
>> > | >> www.CsShippingContainers.co.uk More about... 
>> > | >> MS Access Database SQL � 
>> > | >> Database � 
>> > | >> Excel Database Query � 
>> > | >> Oracle Database Problems � 
>> > | >> 
>> > | >>>> I've posted here. My issue is to do with writing to a PostgreSQL 
>> > | >>>> database from within R. My situation is best explained by some R 
>> > | >>>> code to start: 
>> > | >>>> 
>> > | >>>> #Connect to the database 
>> > | >>>> con <- dbConnect(PostgreSQL(), user="postgres", password="password", 
>> > | >>>> dbname="database") 
>> > | >>>> 
>> > | >>>> #Get some data out of the database. 
>> > | >>>> ucam_no2$original_data <- dbGetQuery(con, "select ucam_no2 FROM 
>> > | >>>> table") 
>> > | >>>> 
>> > | >>>> This returns say 10000 rows of data, but there is only data in about 
>> > | >>>> half of those rows. What I want to do is interpolate the missing 
>> > | >>>> data so I do this: 
>> > | >>>> 
>> > | >>>> #Generate some data 
>> > | >>>> ucam_no2$interpolated_data <- na.approx(ucam_data$ucam_no2, na.rm = 
>> > | >>>> FALSE) 
>> > | >>>> 
>> > | >>>> This works well and I now have 10000 rows of data with no empty cells. 
>> > | >>>> I now want to write this back into my PostgresSQL database. Into the 
>> > | >>>> same row that I took the data from in the first place. But I don't 
>> > | >>>> know how. I can write to a new table with something like the below, 
>> > | >>>> but what I'd really like to do is put the data back into the table I 
>> > | >>>> got it from. 
>> > | >>>> 
>> > | >>>> # Try to write the data back 
>> > | >>>> dbWriteTable(con, "new_data", ucam_no2$interpolated_data) 
>> > | >>> 
>> > | >>> Hi, James. 
>> > | >>> 
>> > | >>> You'll need to look into doing a SQL UPDATE.  That is the standard 
>> > | >>> way to "put data back into the table I got it from". 
>> > | >>> 
>> > | >>> Sean 
>> > | > 
>> > | > _______________________________________________ 
>> > | > R-sig-DB mailing list -- R Special Interest Group R-si... using r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-db 
>> > | > 
>> > | > _______________________________________________ 
>> > | > R-sig-DB mailing list -- R Special Interest Group 
>> > | > R-si... using r-project.org 
>> > | > https://stat.ethz.ch/mailman/listinfo/r-sig-db 
>> > | 
>> > | _______________________________________________ 
>> > | R-sig-DB mailing list -- R Special Interest Group 
>> > | R-si... using r-project.org 
>> > | https://stat.ethz.ch/mailman/listinfo/r-sig-db 
>> > 
>> > -- 
>> > Dirk Eddelbuettel | e... using debian.org | http://dirk.eddelbuettel.com   
>> > 
>> > _______________________________________________ 
>> > R-sig-DB mailing list -- R Special Interest Group 
>> > R-si... using r-project.org 
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-db 
>> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "RPostgreSQL Development and Discussion List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rpostgresql-dev+unsubscribe using googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  


	[[alternative HTML version deleted]]




More information about the R-sig-DB mailing list