[R-sig-DB] Timestamp with time zone type conversion from PostgreSQL to R

Gabor Grothendieck ggrothend|eck @end|ng |rom gm@||@com
Tue Mar 23 16:16:05 CET 2010


This works for me on Windows Vista with the development version of the
RpgSQL package's DBI/RJDBC driver.

The CRAN version of the RpgSQL driver gives problems here too and if
we use Sys.setenv(TZ = "GMT") or Sys.setenv(TZ = "UTC") we also have
problems even with the development version of the RpgSQL package's
driver.

>    library(RpgSQL)
>    Sys.setenv(TZ="")        # NOTE
>
>    # may need to modify next statement based on your setup
>    databaseConnection = dbConnect(dbDriver("pgSQL"), dbname = "mydb")
>
>    # The input: any day, midnight, in UTC
>    query = "SELECT TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC'"
>
>    result = dbGetQuery(databaseConnection, query)
>
>    strftime(result$timestamptz, usetz=TRUE)  # prints:"1999-12-31 01:00:00 UTC"
[1] "1999-12-30 19:00:00 EST"

>    as.numeric(result$timestamptz)
[1] 946598400
> format(result, tz = "GMT")
  timestamptz
1  1999-12-31


> dbGetQuery(databaseConnection, "select version()")
                                                      version
1 PostgreSQL 8.4.2, compiled by Visual C++ build 1400, 32-bit
> R.version.string
[1] "R version 2.10.1 Patched (2010-03-10 r51276)"
> win.version()
[1] "Windows Vista (build 6002) Service Pack 2"

On Tue, Mar 9, 2010 at 4:56 AM, Koen Stegen <koen.stegen using oma.be> wrote:
> Dear all,
>
> The maintainer of the RPostgreSQL package suggested to post this on R-sig-DB,
> because of Linus' Law.
>
> My apologies for a long post, but I hope the explanation makes life easy on the
> eyeballs.
>
> While doing a dbReadTable on a PostgreSQL database, I have noticed that some
> columns with type "timestamp with time zone" behave differently from what I
> expect: in R there seems to be a time zone related offset in the data values.
>
> I have reduced the problem to the following code:
>
>    library("RPostgreSQL")
>    Sys.setenv(TZ="UTC")        # Just to level the playing field, not crucial
>
>    databaseConnection = dbConnect(dbDriver("PostgreSQL"), dbname="template1",
>                                   user="postgres", password="*****")
>
>    # The input: any day, midnight, in UTC
>    query = "SELECT TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC'"
>    result = dbGetQuery(databaseConnection, query)
>
>    # The problem:
>    strftime(result$timestamptz, usetz=TRUE)  # prints:"1999-12-31 01:00:00 UTC"
>    as.numeric(result$timestamptz)            # prints: 946602000
>
>
> The value of strftime is 1 am instead of midnight. The numeric value is 3600
> seconds higher than I expect, so it is not *just* a display problem.
>
> First, I have checked my expectations with the date command in a Bash shell:
>  date --date='1999-12-31 00:00:00 UTC' +%s
>
> Then I have checked my database with the psql command line client:
>  SELECT EXTRACT(epoch FROM TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC');
>
> Both show: 946598400, 3600 less than what R shows.
>
> To verify that this is indeed the correct value, I have used the online tool
> from epochconverter dot com, and the human readable version of this time is, as
> expected, midnight:  Fri, 31 Dec 1999 00:00:00 GMT
>
> This was all done on:
>
>    OS: OpenSUSE 11.2
>    R: R version 2.10.1 Patched (2010-03-07 r51225)
>    DBI: 0.2-5
>    RPostgreSQL: 0.1-6
>    PostgreSQL: 8.4.2
>
>
> I have repeated this procedure on a Windows XP machine. To get the RPostgreSQL
> package working, I have manually copied some .dll files to the system32
> directory, but I think this is not relevant.
>
> The result on Windows may actually shed some light on what is going on:
>
>
> Warning messages:
> 1: In strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) :
>  unknown timezone '%Y-%m-%d %H:%M:%S'
> 2: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"),  :
>  unknown timezone '%Y-%m-%d %H:%M:%S'
> 3: In strptime(x, f, tz = tz) : unknown timezone '%Y-%m-%d %H:%M:%S'
> 4: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"),  :
>  unknown timezone '%Y-%m-%d %H:%M:%S'
>
>
> It is clear that '%Y-%m-%d %H:%M:%S' is a date format, and not a time zone.
>
> My guess would be that this is simple copy-paste mistake in the code, unintended
> recycling, or an off-by-one in a C char**.
>
>
> I don't have sufficient expertise to test this against other databases (afaics
> MySQL doesn't even have a data type that includes the time zone) so I don't know
> whether this is a problem specific to RPostgreSQL, or it encompasses all DBI
> implementations.
>
> Could any of the RPostgreSQL developers please have a look at this, and pass it
> on to DBI and/or R core devs if necessary?
>
>
> Thanks in advance,
> Koen
> Royal Meteorological Institute of Belgium
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>




More information about the R-sig-DB mailing list