[R] Time Zone problems: midnight goes in; 8am comes out

Boylan, Ross Ro@@@Boy|@n @end|ng |rom uc@|@edu
Wed Mar 2 02:51:04 CET 2022

I'm having problems with timezones using lubridate, but it's not clear to me the difficulty is in lubridate.
> r2 <- parse_date_time("1970-01-01 00:01:00", "ymd HMS", tz="PST")
> r2
[1] "1970-01-01 08:01:00 PST"  ## Oops: midnight has turned in 8am
> as.numeric(r2)
[1] 28860
> 8*3600 # seconds in 8 hours
[1] 28800
lubridate accepts PST as the time zone, and the result prints "PST" for timezone.  Further, lubridate seems to be using the tz properly since it gets the 8 hour offset from UTC correct.

The problem is the value that is printed gives a UTC time of 08:01 despite having the PST suffix.  So the time appears to have jumped 8 hours ahead from the value parsed.

PST appears not to be a legal timezone (in spite of lubridate inferring the correct offset from it):
> Sys.timezone()
[1] "America/Los_Angeles"

> (grep("PST", OlsonNames(), value=TRUE))
[1] "PST8PDT"         "SystemV/PST8"    "SystemV/PST8PDT"
https://www.r-bloggers.com/2018/07/a-tour-of-timezones-troubles-in-r/ says lubridate will complain if given an invalid tz, though I don't see that explicitly in the current man page https://lubridate.tidyverse.org/reference/parse_date_time.html.  As shown above, parse_date_time() does not complain about the timezone, and does use it to get the correct offset.

Using America/Los_Angeles produces the expected results:
> r4 <- parse_date_time("1970-01-01 00:01:00", "ymd HMS", tz=Sys.timezone())
> r4
[1] "1970-01-01 00:01:00 PST"  # still prints PST.  This time it's true!
> as.numeric(r4)
[1] 28860

I suppose I can just use "America/Los_Angeles" as the time zone; this would have the advantage of making all my timezones the same, which apparently what R requires for a vector of datetimes.  But the behavior seems odd, and the "fix" also requires me to ignore the time zone specified in my inputs, which look like "2022-03-01 15:54:30 PST" or PDT, depending on time of year.

1. Why this strange behavior in which PST or PDT is used to construct the proper offset from UTC, and then kind of forgotten on output?
2. Is this a bug in lubridate or base POSIXct, particularly its print routine?

My theory on 1 is that lubridate understands PST and constructs an appropriate UTC time.  POSIXct time does not understand a tz of "PST" and so prints out the UTC value for the time, "decorating" it with the not understood tz value.  

For 2, on one hand, lubridate is constructing POSIXct dates with invalid tz values; lubridate probably shouldn't.  On the other hand, POSIXct is printing a UTC time but labeling it with a tz it doesn't understand, so it looks if it's in that local time even though it isn't.  In the context above that seems like a bug, but it's possible a lot of code that depends on it.

Under these theories, the problems only arise because the set of tz values understood by lubridate differs from the tz value understood by POSIXct.

R 3.5.2
lubridate 1.7.4
Debian GNU/Linux 10 aka buster (amd64 flavor)

Ross Boylan

More information about the R-help mailing list