[BioC] dbWriteTable Failure

Elliot Joel Bernstein elliot.bernstein at fdopartners.com
Fri Jan 14 20:35:51 CET 2011


Dirk -

Thank you for following up on this. It's possible I'm just misunderstanding the code, but I grabbed the source for RPostgreSQL ver. 0.1-7, and as far as I can tell, the work for dbWriteTable (in postgresqlWriteTable) still works by creating a temporary file. (Lines 649-660 of PostgreSQLSupport.R.)

- Elliot

On Fri, Jan 14, 2011 at 11:56:47AM -0600, Dirk Eddelbuettel wrote:
> 
> On 14 January 2011 at 12:46, Sean Davis wrote:
> | 
> | 
> | On Fri, Jan 14, 2011 at 12:36 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> | 
> |     On 01/14/2011 06:25 AM, Elliot Joel Bernstein wrote:
> |     > Martin -
> |     >
> |     > Thanks for your reply. Is RdbiPgSQL no longer maintained at all? I
> | 
> |     Hi Elliot --
> | 
> |     (sorry for the last post, oops). The package hasn't had any substantial
> |     change since 2006. I'm cc'ing the currently listed maintainer, and also
> |     the maintainer of RPostgreSQL, who might have additional insights.
> | 
> | 
> | 
> | Perhaps Dirk will comment directly but, with regards to RPosgreSQL, there is a
> | google code site and an issue for dbWriteTable was already filed.
> | 
> | http://code.google.com/p/rpostgresql/issues/detail?id=4&can=1
> | 
> | The resolution is that Dirk would appreciate a patch (which I have not been
> | able to supply--someone else?).
> 
> Did the OP try a current version?   NEWS follows below.  Tomoaki has done
> some wonderful work and we are in much better shape than we were, say, one
> year ago.
> 
> That said, there are always open issues.  See the Google Code site; and I cc
> out dev list.
> 
> Dirk
> 
> Version 0.1-7 -- 2010-10-17
> 
>     o   Several potential buffer overruns were fixed
> 
>     o   dbWriteTable now writes a data.frame to database through a network
>         connection rather than a temporary file. Note that row_names may be
>         changed in future releases.  Also, passing in filenames instead of
>         data.frame is not supported at this time. 
> 
>     o   When no host is specified, a connection to the PostgreSQL server 
>         is made via UNIX domain socket (just like psql does)
> 
>     o   Table and column names are case sensitive, and identifiers are escaped
>         or quoted appropriately, so that any form of table/column names can be
>         created, searched, or removed, including upper-, lower- and mixed-case.
> 
>     o   nullOk in dbColumnInfo has a return value of NA when the column does
>         not correspond to a column in the table. The utility of nullOk is
>         doubtful but not removed at this time.
> 
>     o   Correct Windows getpid() declaration (with thanks to Brian D. Ripley)
> 
>     o   A call of as.POSIXct() with a time format string wrongly passed to TZ
>         has been corrected; this should help with intra-day timestamps (with
>         thanks to Steve Eick)
> 
>     o   Usage of tmpdir has been improved on similarly to Linux (with thanks
>         to Robert McGehee)
> 
> Dirk
> 
> 
> | 
> | Sean
> | 
> |  
> | 
> |     Martin
> | 
> |     have tried RPostgreSQL, but it has an even worse problem. In that
> |     package, dbWriteTable is implemented by calling write.table on the data
> |     frame to write it to a temporary file, and then load it into the
> |     database, which means that the formatting options set in R affect the
> |     result. Specifically, write.table by default formats some numbers in
> |     scientific notation, which causes the database load to fail.
> |     >
> |     > After looking into the problem with RdbiPgSQL a little more, it
> |     > seems
> |     that it's caused by dbDisconnect calling the same function that is
> |     registered as the finalizer. I think it could be easily solved by using
> |     an indicator variable to keep track of whether PQfinish has been called.
> |     (I can also just call rm instead of dbDisconnect, but it's a little
> |     disconcerting that the package contains a function that causes memory
> |     errors when used as intended.)
> | 
> |     > - Elliot
> |     >
> |     >
> |     > On Tue, Jan 11, 2011 at 08:12:21PM -0800, Martin Morgan wrote:
> |     >> On 01/11/2011 11:37 AM, Elliot Joel Bernstein wrote:
> |     >>> I am having a problem using the dbWriteTable function in package
> |     RdbiPgSQL to insert data into a PostgreSQL table. Specifically, the
> |     following code sometimes works, but sometimes fails with an error message
> |     indicating that there is "no database connection":
> |     >>>
> |     >>>
> |     >>>
> |     >>> =====================================================================
> |     >>> dbtest.R
> |     >>> =====================================================================
> |     >>>
> |     >>> require(RdbiPgSQL)
> |     >>
> |     >> Hi Elliot -- not so much an answer, but unless RdbiPgSQL is central you
> |     >> might try the CRAN package RPostgreSQL, which is actively maintained.
> |     >>
> |     >> Martin
> |     >>
> |     >>>
> |     >>> test <- function(conn, n) {
> |     >>>
> |     >>>   dbSendQuery(conn, "create temp table temp (x int, y int)")
> |     >>>
> |     >>>   n <- 10000
> |     >>>   dat <- data.frame(x=sample(1000,n,replace=T), y=sample(1000,n,replace
> |     =T))
> |     >>>
> |     >>>   for (i in 1:dim(dat)[1]) {
> |     >>>
> |     >>>     err <- try(dbSendQuery(conn, sprintf("insert into temp values
> |     (%d,%d)", dat$x[i], dat$y[i])))
> |     >>>     if (inherits(err, "try-error")) {
> |     >>>       print(conn)
> |     >>>       break
> |     >>>     }
> |     >>>
> |     >>>   }
> |     >>>
> |     >>>   print(dbGetQuery(conn, "select count(1) from temp"))
> |     >>>
> |     >>> }
> |     >>>
> |     >>> conn <- dbConnect(PgSQL(), dbname="gf", host="gf-host")
> |     >>> test(conn, 10000)
> |     >>> dbDisconnect(conn)
> |     >>>
> |     >>> =====================================================================
> |     >>>
> |     >>>
> |     >>>
> |     >>> The following is sample output from multiple runs of this script:
> |     >>>
> |     >>>
> |     >>>
> |     >>> =====================================================================
> |     >>> Sample output
> |     >>> =====================================================================
> |     >>>
> |     >>> R version 2.12.0 (2010-10-15)
> |     >>> Copyright (C) 2010 The R Foundation for Statistical Computing
> |     >>> ISBN 3-900051-07-0
> |     >>> Platform: x86_64-redhat-linux-gnu (64-bit)
> |     >>>
> |     >>> R is free software and comes with ABSOLUTELY NO WARRANTY.
> |     >>> You are welcome to redistribute it under certain conditions.
> |     >>> Type 'license()' or 'licence()' for distribution details.
> |     >>>
> |     >>>   Natural language support but running in an English locale
> |     >>>
> |     >>> R is a collaborative project with many contributors.
> |     >>> Type 'contributors()' for more information and
> |     >>> 'citation()' on how to cite R or R packages in publications.
> |     >>>
> |     >>> Type 'demo()' for some demos, 'help()' for on-line help, or
> |     >>> 'help.start()' for an HTML browser interface to help.
> |     >>> Type 'q()' to quit R.
> |     >>>
> |     >>>> source("dbtest.R")
> |     >>> Loading required package: RdbiPgSQL
> |     >>> Loading required package: Rdbi
> |     >>>   count
> |     >>> 1 10000
> |     >>>> source("dbtest.R")
> |     >>> Error in dbSendQuery.PgSQL.conn(conn, sprintf("insert into temp values
> |     (%d,%d)",  :
> |     >>>   No database connection
> |     >>> status = 1
> |     >>> database.name =  a
> |     >>> host.name = p��
> |     >>> options = 8}G
> |     >>> password = P�
> |     >>> port = �
> |     >>> tty = ��
> |     >>> socket = -1
> |     >>> client.encoding = -1
> |     >>> backend.pid = 0
> |     >>> Error in dbSendQuery.PgSQL.conn(conn, ...) : No database connection
> |     >>>
> |     >>> =====================================================================
> |     >>>
> |     >>>
> |     >>>
> |     >>> Here is the result of a traceback:
> |     >>>
> |     >>>
> |     >>>
> |     >>> =====================================================================
> |     >>> traceback
> |     >>> =====================================================================
> |     >>>
> |     >>>> traceback()
> |     >>> 11: .Call("PgSQLsendQuery", conn, query, PACKAGE = "RdbiPgSQL")
> |     >>> 10: dbSendQuery.PgSQL.conn(conn, ...) at dbtest.R#21
> |     >>> 9: dbSendQuery(conn, ...) at dbtest.R#21
> |     >>> 8: dbGetResult(dbSendQuery(conn, ...)) at dbtest.R#21
> |     >>> 7: dbGetQuery.PgSQL.conn(conn, "select count(1) from temp") at dbtest.R
> |     #21
> |     >>> 6: dbGetQuery(conn, "select count(1) from temp") at dbtest.R#21
> |     >>> 5: print(dbGetQuery(conn, "select count(1) from temp")) at dbtest.R#21
> |     >>> 4: test(conn, 10000)
> |     >>> 3: eval.with.vis(expr, envir, enclos)
> |     >>> 2: eval.with.vis(ei, envir)
> |     >>> 1: source("dbtest.R")
> |     >>>
> |     >>> =====================================================================
> |     >>>
> |     >>>
> |     >>>
> |     >>> And here is my sessionInfo:
> |     >>>
> |     >>>
> |     >>>
> |     >>> =====================================================================
> |     >>> sessionInfo
> |     >>> =====================================================================
> |     >>>
> |     >>>> sessionInfo()
> |     >>> R version 2.12.0 (2010-10-15)
> |     >>> Platform: x86_64-redhat-linux-gnu (64-bit)
> |     >>>
> |     >>> locale:
> |     >>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> |     >>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> |     >>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> |     >>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> |     >>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> |     >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> |     >>>
> |     >>> attached base packages:
> |     >>> [1] stats     graphics  grDevices utils     datasets  methods   base
> |     >>>
> |     >>> other attached packages:
> |     >>> [1] RdbiPgSQL_1.24.0 Rdbi_1.24.0
> |     >>>
> |     >>> =====================================================================
> |     >>>
> |     >>>
> |     >>>
> |     >>> I am using version 8.4.4 of the PostgreSQL-devel libraries, and the
> |     server is running version 8.4.1. The problem does not always occur on the
> |     second run; I am sometimes able to run the script multiple times before it
> |     eventually fails. Also, it does not always fail at the same row of the
> |     dataframe, nor is there anything obviously wrong with the data values on
> |     the row where it fails. I would appreciate any help in diagnosing the
> |     problem.
> |     >>>
> |     >>> Thank you very much.
> |     >>>
> |     >>> - Elliot
> |     >>>
> |     >>> _______________________________________________
> |     >>> Bioconductor mailing list
> |     >>> Bioconductor at r-project.org
> |     >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> |     >>> Search the archives: http://news.gmane.org/
> |     gmane.science.biology.informatics.conductor
> |     >>
> |     >>
> |     >> --
> |     >> Computational Biology
> |     >> Fred Hutchinson Cancer Research Center
> |     >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> |     >>
> |     >> Location: M1-B861
> |     >> Telephone: 206 667-2793
> | 
> | 
> |     --
> |     Computational Biology
> |     Fred Hutchinson Cancer Research Center
> |     1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> | 
> |     Location: M1-B861
> |     Telephone: 206 667-2793
> | 
> |     _______________________________________________
> |     Bioconductor mailing list
> |     Bioconductor at r-project.org
> |     https://stat.ethz.ch/mailman/listinfo/bioconductor
> |     Search the archives: http://news.gmane.org/
> |     gmane.science.biology.informatics.conductor
> | 
> | 
> 
> -- 
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com



More information about the Bioconductor mailing list