[BioC] dbWriteTable Failure

Dirk Eddelbuettel edd at debian.org
Fri Jan 14 20:54:12 CET 2011


Elliot,

On 14 January 2011 at 14:35, Elliot Joel Bernstein wrote:
| Dirk -
| 
| Thank you for following up on this. It's possible I'm just misunderstanding the code, but I grabbed the source for RPostgreSQL ver. 0.1-7, and as far as I can tell, the work for dbWriteTable (in postgresqlWriteTable) still works by creating a temporary file. (Lines 649-660 of PostgreSQLSupport.R.)

Hm. Looking at the code, I see that you have a point there.  Maybe Tomoaki
can clarify.   

We did move a couple of bugs out of the way and I was under the impression
that this had been taken care of too.  If not, well, you always have your
money back guarantee.... ;-)

And in case that wasn't patently obvious, we *do* welcome welcome patches.

Cheers, Dirk

| 
| - Elliot
| 
| On Fri, Jan 14, 2011 at 11:56:47AM -0600, Dirk Eddelbuettel wrote:
| > 
| > On 14 January 2011 at 12:46, Sean Davis wrote:
| > | 
| > | 
| > | On Fri, Jan 14, 2011 at 12:36 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
| > | 
| > |     On 01/14/2011 06:25 AM, Elliot Joel Bernstein wrote:
| > |     > Martin -
| > |     >
| > |     > Thanks for your reply. Is RdbiPgSQL no longer maintained at all? I
| > | 
| > |     Hi Elliot --
| > | 
| > |     (sorry for the last post, oops). The package hasn't had any substantial
| > |     change since 2006. I'm cc'ing the currently listed maintainer, and also
| > |     the maintainer of RPostgreSQL, who might have additional insights.
| > | 
| > | 
| > | 
| > | Perhaps Dirk will comment directly but, with regards to RPosgreSQL, there is a
| > | google code site and an issue for dbWriteTable was already filed.
| > | 
| > | http://code.google.com/p/rpostgresql/issues/detail?id=4&can=1
| > | 
| > | The resolution is that Dirk would appreciate a patch (which I have not been
| > | able to supply--someone else?).
| > 
| > Did the OP try a current version?   NEWS follows below.  Tomoaki has done
| > some wonderful work and we are in much better shape than we were, say, one
| > year ago.
| > 
| > That said, there are always open issues.  See the Google Code site; and I cc
| > out dev list.
| > 
| > Dirk
| > 
| > Version 0.1-7 -- 2010-10-17
| > 
| >     o   Several potential buffer overruns were fixed
| > 
| >     o   dbWriteTable now writes a data.frame to database through a network
| >         connection rather than a temporary file. Note that row_names may be
| >         changed in future releases.  Also, passing in filenames instead of
| >         data.frame is not supported at this time. 
| > 
| >     o   When no host is specified, a connection to the PostgreSQL server 
| >         is made via UNIX domain socket (just like psql does)
| > 
| >     o   Table and column names are case sensitive, and identifiers are escaped
| >         or quoted appropriately, so that any form of table/column names can be
| >         created, searched, or removed, including upper-, lower- and mixed-case.
| > 
| >     o   nullOk in dbColumnInfo has a return value of NA when the column does
| >         not correspond to a column in the table. The utility of nullOk is
| >         doubtful but not removed at this time.
| > 
| >     o   Correct Windows getpid() declaration (with thanks to Brian D. Ripley)
| > 
| >     o   A call of as.POSIXct() with a time format string wrongly passed to TZ
| >         has been corrected; this should help with intra-day timestamps (with
| >         thanks to Steve Eick)
| > 
| >     o   Usage of tmpdir has been improved on similarly to Linux (with thanks
| >         to Robert McGehee)
| > 
| > Dirk
| > 
| > 
| > | 
| > | Sean
| > | 
| > |  
| > | 
| > |     Martin
| > | 
| > |     have tried RPostgreSQL, but it has an even worse problem. In that
| > |     package, dbWriteTable is implemented by calling write.table on the data
| > |     frame to write it to a temporary file, and then load it into the
| > |     database, which means that the formatting options set in R affect the
| > |     result. Specifically, write.table by default formats some numbers in
| > |     scientific notation, which causes the database load to fail.
| > |     >
| > |     > After looking into the problem with RdbiPgSQL a little more, it
| > |     > seems
| > |     that it's caused by dbDisconnect calling the same function that is
| > |     registered as the finalizer. I think it could be easily solved by using
| > |     an indicator variable to keep track of whether PQfinish has been called.
| > |     (I can also just call rm instead of dbDisconnect, but it's a little
| > |     disconcerting that the package contains a function that causes memory
| > |     errors when used as intended.)
| > | 
| > |     > - Elliot
| > |     >
| > |     >
| > |     > On Tue, Jan 11, 2011 at 08:12:21PM -0800, Martin Morgan wrote:
| > |     >> On 01/11/2011 11:37 AM, Elliot Joel Bernstein wrote:
| > |     >>> I am having a problem using the dbWriteTable function in package
| > |     RdbiPgSQL to insert data into a PostgreSQL table. Specifically, the
| > |     following code sometimes works, but sometimes fails with an error message
| > |     indicating that there is "no database connection":
| > |     >>>
| > |     >>>
| > |     >>>
| > |     >>> =====================================================================
| > |     >>> dbtest.R
| > |     >>> =====================================================================
| > |     >>>
| > |     >>> require(RdbiPgSQL)
| > |     >>
| > |     >> Hi Elliot -- not so much an answer, but unless RdbiPgSQL is central you
| > |     >> might try the CRAN package RPostgreSQL, which is actively maintained.
| > |     >>
| > |     >> Martin
| > |     >>
| > |     >>>
| > |     >>> test <- function(conn, n) {
| > |     >>>
| > |     >>>   dbSendQuery(conn, "create temp table temp (x int, y int)")
| > |     >>>
| > |     >>>   n <- 10000
| > |     >>>   dat <- data.frame(x=sample(1000,n,replace=T), y=sample(1000,n,replace
| > |     =T))
| > |     >>>
| > |     >>>   for (i in 1:dim(dat)[1]) {
| > |     >>>
| > |     >>>     err <- try(dbSendQuery(conn, sprintf("insert into temp values
| > |     (%d,%d)", dat$x[i], dat$y[i])))
| > |     >>>     if (inherits(err, "try-error")) {
| > |     >>>       print(conn)
| > |     >>>       break
| > |     >>>     }
| > |     >>>
| > |     >>>   }
| > |     >>>
| > |     >>>   print(dbGetQuery(conn, "select count(1) from temp"))
| > |     >>>
| > |     >>> }
| > |     >>>
| > |     >>> conn <- dbConnect(PgSQL(), dbname="gf", host="gf-host")
| > |     >>> test(conn, 10000)
| > |     >>> dbDisconnect(conn)
| > |     >>>
| > |     >>> =====================================================================
| > |     >>>
| > |     >>>
| > |     >>>
| > |     >>> The following is sample output from multiple runs of this script:
| > |     >>>
| > |     >>>
| > |     >>>
| > |     >>> =====================================================================
| > |     >>> Sample output
| > |     >>> =====================================================================
| > |     >>>
| > |     >>> R version 2.12.0 (2010-10-15)
| > |     >>> Copyright (C) 2010 The R Foundation for Statistical Computing
| > |     >>> ISBN 3-900051-07-0
| > |     >>> Platform: x86_64-redhat-linux-gnu (64-bit)
| > |     >>>
| > |     >>> R is free software and comes with ABSOLUTELY NO WARRANTY.
| > |     >>> You are welcome to redistribute it under certain conditions.
| > |     >>> Type 'license()' or 'licence()' for distribution details.
| > |     >>>
| > |     >>>   Natural language support but running in an English locale
| > |     >>>
| > |     >>> R is a collaborative project with many contributors.
| > |     >>> Type 'contributors()' for more information and
| > |     >>> 'citation()' on how to cite R or R packages in publications.
| > |     >>>
| > |     >>> Type 'demo()' for some demos, 'help()' for on-line help, or
| > |     >>> 'help.start()' for an HTML browser interface to help.
| > |     >>> Type 'q()' to quit R.
| > |     >>>
| > |     >>>> source("dbtest.R")
| > |     >>> Loading required package: RdbiPgSQL
| > |     >>> Loading required package: Rdbi
| > |     >>>   count
| > |     >>> 1 10000
| > |     >>>> source("dbtest.R")
| > |     >>> Error in dbSendQuery.PgSQL.conn(conn, sprintf("insert into temp values
| > |     (%d,%d)",  :
| > |     >>>   No database connection
| > |     >>> status = 1
| > |     >>> database.name =  a
| > |     >>> host.name = p��
| > |     >>> options = 8}G
| > |     >>> password = P�
| > |     >>> port = �
| > |     >>> tty = ��
| > |     >>> socket = -1
| > |     >>> client.encoding = -1
| > |     >>> backend.pid = 0
| > |     >>> Error in dbSendQuery.PgSQL.conn(conn, ...) : No database connection
| > |     >>>
| > |     >>> =====================================================================
| > |     >>>
| > |     >>>
| > |     >>>
| > |     >>> Here is the result of a traceback:
| > |     >>>
| > |     >>>
| > |     >>>
| > |     >>> =====================================================================
| > |     >>> traceback
| > |     >>> =====================================================================
| > |     >>>
| > |     >>>> traceback()
| > |     >>> 11: .Call("PgSQLsendQuery", conn, query, PACKAGE = "RdbiPgSQL")
| > |     >>> 10: dbSendQuery.PgSQL.conn(conn, ...) at dbtest.R#21
| > |     >>> 9: dbSendQuery(conn, ...) at dbtest.R#21
| > |     >>> 8: dbGetResult(dbSendQuery(conn, ...)) at dbtest.R#21
| > |     >>> 7: dbGetQuery.PgSQL.conn(conn, "select count(1) from temp") at dbtest.R
| > |     #21
| > |     >>> 6: dbGetQuery(conn, "select count(1) from temp") at dbtest.R#21
| > |     >>> 5: print(dbGetQuery(conn, "select count(1) from temp")) at dbtest.R#21
| > |     >>> 4: test(conn, 10000)
| > |     >>> 3: eval.with.vis(expr, envir, enclos)
| > |     >>> 2: eval.with.vis(ei, envir)
| > |     >>> 1: source("dbtest.R")
| > |     >>>
| > |     >>> =====================================================================
| > |     >>>
| > |     >>>
| > |     >>>
| > |     >>> And here is my sessionInfo:
| > |     >>>
| > |     >>>
| > |     >>>
| > |     >>> =====================================================================
| > |     >>> sessionInfo
| > |     >>> =====================================================================
| > |     >>>
| > |     >>>> sessionInfo()
| > |     >>> R version 2.12.0 (2010-10-15)
| > |     >>> Platform: x86_64-redhat-linux-gnu (64-bit)
| > |     >>>
| > |     >>> locale:
| > |     >>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
| > |     >>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
| > |     >>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
| > |     >>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
| > |     >>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
| > |     >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
| > |     >>>
| > |     >>> attached base packages:
| > |     >>> [1] stats     graphics  grDevices utils     datasets  methods   base
| > |     >>>
| > |     >>> other attached packages:
| > |     >>> [1] RdbiPgSQL_1.24.0 Rdbi_1.24.0
| > |     >>>
| > |     >>> =====================================================================
| > |     >>>
| > |     >>>
| > |     >>>
| > |     >>> I am using version 8.4.4 of the PostgreSQL-devel libraries, and the
| > |     server is running version 8.4.1. The problem does not always occur on the
| > |     second run; I am sometimes able to run the script multiple times before it
| > |     eventually fails. Also, it does not always fail at the same row of the
| > |     dataframe, nor is there anything obviously wrong with the data values on
| > |     the row where it fails. I would appreciate any help in diagnosing the
| > |     problem.
| > |     >>>
| > |     >>> Thank you very much.
| > |     >>>
| > |     >>> - Elliot
| > |     >>>
| > |     >>> _______________________________________________
| > |     >>> Bioconductor mailing list
| > |     >>> Bioconductor at r-project.org
| > |     >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
| > |     >>> Search the archives: http://news.gmane.org/
| > |     gmane.science.biology.informatics.conductor
| > |     >>
| > |     >>
| > |     >> --
| > |     >> Computational Biology
| > |     >> Fred Hutchinson Cancer Research Center
| > |     >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
| > |     >>
| > |     >> Location: M1-B861
| > |     >> Telephone: 206 667-2793
| > | 
| > | 
| > |     --
| > |     Computational Biology
| > |     Fred Hutchinson Cancer Research Center
| > |     1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
| > | 
| > |     Location: M1-B861
| > |     Telephone: 206 667-2793
| > | 
| > |     _______________________________________________
| > |     Bioconductor mailing list
| > |     Bioconductor at r-project.org
| > |     https://stat.ethz.ch/mailman/listinfo/bioconductor
| > |     Search the archives: http://news.gmane.org/
| > |     gmane.science.biology.informatics.conductor
| > | 
| > | 
| > 
| > -- 
| > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com



More information about the Bioconductor mailing list