[R-sig-DB] Timestamp with time zone type conversion from PostgreSQL to R
Koen Stegen
koen@@tegen @end|ng |rom om@@be
Tue Mar 9 09:56:40 CET 2010
Dear all,
The maintainer of the RPostgreSQL package suggested to post this on R-sig-DB,
because of Linus' Law.
My apologies for a long post, but I hope the explanation makes life easy on the
eyeballs.
While doing a dbReadTable on a PostgreSQL database, I have noticed that some
columns with type "timestamp with time zone" behave differently from what I
expect: in R there seems to be a time zone related offset in the data values.
I have reduced the problem to the following code:
library("RPostgreSQL")
Sys.setenv(TZ="UTC") # Just to level the playing field, not crucial
databaseConnection = dbConnect(dbDriver("PostgreSQL"), dbname="template1",
user="postgres", password="*****")
# The input: any day, midnight, in UTC
query = "SELECT TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC'"
result = dbGetQuery(databaseConnection, query)
# The problem:
strftime(result$timestamptz, usetz=TRUE) # prints:"1999-12-31 01:00:00 UTC"
as.numeric(result$timestamptz) # prints: 946602000
The value of strftime is 1 am instead of midnight. The numeric value is 3600
seconds higher than I expect, so it is not *just* a display problem.
First, I have checked my expectations with the date command in a Bash shell:
date --date='1999-12-31 00:00:00 UTC' +%s
Then I have checked my database with the psql command line client:
SELECT EXTRACT(epoch FROM TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC');
Both show: 946598400, 3600 less than what R shows.
To verify that this is indeed the correct value, I have used the online tool
from epochconverter dot com, and the human readable version of this time is, as
expected, midnight: Fri, 31 Dec 1999 00:00:00 GMT
This was all done on:
OS: OpenSUSE 11.2
R: R version 2.10.1 Patched (2010-03-07 r51225)
DBI: 0.2-5
RPostgreSQL: 0.1-6
PostgreSQL: 8.4.2
I have repeated this procedure on a Windows XP machine. To get the RPostgreSQL
package working, I have manually copied some .dll files to the system32
directory, but I think this is not relevant.
The result on Windows may actually shed some light on what is going on:
Warning messages:
1: In strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) :
unknown timezone '%Y-%m-%d %H:%M:%S'
2: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"), :
unknown timezone '%Y-%m-%d %H:%M:%S'
3: In strptime(x, f, tz = tz) : unknown timezone '%Y-%m-%d %H:%M:%S'
4: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"), :
unknown timezone '%Y-%m-%d %H:%M:%S'
It is clear that '%Y-%m-%d %H:%M:%S' is a date format, and not a time zone.
My guess would be that this is simple copy-paste mistake in the code, unintended
recycling, or an off-by-one in a C char**.
I don't have sufficient expertise to test this against other databases (afaics
MySQL doesn't even have a data type that includes the time zone) so I don't know
whether this is a problem specific to RPostgreSQL, or it encompasses all DBI
implementations.
Could any of the RPostgreSQL developers please have a look at this, and pass it
on to DBI and/or R core devs if necessary?
Thanks in advance,
Koen
Royal Meteorological Institute of Belgium
More information about the R-sig-DB
mailing list