[R-sig-DB] Timestamp with time zone type conversion from PostgreSQL to R

Koen Stegen koen@@tegen @end|ng |rom om@@be
Tue Mar 9 09:56:40 CET 2010


Dear all,

The maintainer of the RPostgreSQL package suggested to post this on R-sig-DB,
because of Linus' Law.

My apologies for a long post, but I hope the explanation makes life easy on the
eyeballs.

While doing a dbReadTable on a PostgreSQL database, I have noticed that some
columns with type "timestamp with time zone" behave differently from what I
expect: in R there seems to be a time zone related offset in the data values.

I have reduced the problem to the following code:

    library("RPostgreSQL")
    Sys.setenv(TZ="UTC")        # Just to level the playing field, not crucial

    databaseConnection = dbConnect(dbDriver("PostgreSQL"), dbname="template1",
                                   user="postgres", password="*****")

    # The input: any day, midnight, in UTC
    query = "SELECT TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC'"
    result = dbGetQuery(databaseConnection, query)

    # The problem:
    strftime(result$timestamptz, usetz=TRUE)  # prints:"1999-12-31 01:00:00 UTC"
    as.numeric(result$timestamptz)            # prints: 946602000


The value of strftime is 1 am instead of midnight. The numeric value is 3600
seconds higher than I expect, so it is not *just* a display problem.

First, I have checked my expectations with the date command in a Bash shell:
  date --date='1999-12-31 00:00:00 UTC' +%s

Then I have checked my database with the psql command line client:
  SELECT EXTRACT(epoch FROM TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC');

Both show: 946598400, 3600 less than what R shows.

To verify that this is indeed the correct value, I have used the online tool
from epochconverter dot com, and the human readable version of this time is, as
expected, midnight:  Fri, 31 Dec 1999 00:00:00 GMT

This was all done on:

    OS: OpenSUSE 11.2
    R: R version 2.10.1 Patched (2010-03-07 r51225)
    DBI: 0.2-5
    RPostgreSQL: 0.1-6
    PostgreSQL: 8.4.2


I have repeated this procedure on a Windows XP machine. To get the RPostgreSQL
package working, I have manually copied some .dll files to the system32
directory, but I think this is not relevant.

The result on Windows may actually shed some light on what is going on:


Warning messages:
1: In strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) :
  unknown timezone '%Y-%m-%d %H:%M:%S'
2: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"),  :
  unknown timezone '%Y-%m-%d %H:%M:%S'
3: In strptime(x, f, tz = tz) : unknown timezone '%Y-%m-%d %H:%M:%S'
4: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"),  :
  unknown timezone '%Y-%m-%d %H:%M:%S'


It is clear that '%Y-%m-%d %H:%M:%S' is a date format, and not a time zone.

My guess would be that this is simple copy-paste mistake in the code, unintended
recycling, or an off-by-one in a C char**.


I don't have sufficient expertise to test this against other databases (afaics
MySQL doesn't even have a data type that includes the time zone) so I don't know
whether this is a problem specific to RPostgreSQL, or it encompasses all DBI
implementations.

Could any of the RPostgreSQL developers please have a look at this, and pass it
on to DBI and/or R core devs if necessary?


Thanks in advance,
Koen
Royal Meteorological Institute of Belgium




More information about the R-sig-DB mailing list