[R-pkg-devel] Issue handling datetimes: possible differences between computers

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Mon Oct 10 04:31:03 CEST 2022


... which is why tidyverse functions and Python datetime handling irk me so much.

Is tidyverse time handling intrinsically broken? They have a standard practice of reading time as UTC and then using force_tz to fix the "mistake". Same as Python.

On October 9, 2022 6:57:06 PM PDT, Simon Urbanek <simon.urbanek using R-project.org> wrote:
>Alexandre,
>
>it's better to parse the timestamp in correct timezone:
>
>> foo = as.POSIXlt("2021-10-01", "UTC")
>> as.POSIXct(as.character(foo), "Europe/Berlin")
>[1] "2021-10-01 CEST"
>
>The issue stems from the fact that you are pretending like your timestamp is UTC (which it is not) while you want to interpret the same values in a different time zone. The DST flags varies depending on the day (due to DST being 0 or 1 depending on the date) and POSIXlt does not have that information since you only attached the time zone without updating it:
>
>> str(unclass(as.POSIXlt(foo, "Europe/Berlin")))
>List of 9
> $ sec  : num 0
> $ min  : int 0
> $ hour : int 0
> $ mday : int 1
> $ mon  : int 9
> $ year : int 121
> $ wday : int 5
> $ yday : int 273
> $ isdst: int 0
> - attr(*, "tzone")= chr "Europe/Berlin"
>
>note that isdst is 0 from the UTC entry (which doesn't have DST) even though that date is actually DST in CEST. Compare that to the correctly parsed POSIXlt:
>
>> str(unclass(as.POSIXlt(as.character(foo), "Europe/Berlin")))
>List of 11
> $ sec   : num 0
> $ min   : int 0
> $ hour  : int 0
> $ mday  : int 1
> $ mon   : int 9
> $ year  : int 121
> $ wday  : int 5
> $ yday  : int 273
> $ isdst : int 1
> $ zone  : chr "CEST"
> $ gmtoff: int NA
> - attr(*, "tzone")= chr "Europe/Berlin"
>
>where isdst is 1 since it is indeed the DST. The OS difference seems to be that Linux respects the isdst information from POSIXlt while Windows and macOS ignores it. This behavior is documented: 
>
>     At all other times ‘isdst’ can be deduced from the
>     first six values, but the behaviour if it is set incorrectly is
>     platform-dependent.
>
>You can re-set isdst to -1 to make sure R will try to determine it:
>
>> foo$isdst = -1L
>> as.POSIXct(foo, "Europe/Berlin")
>[1] "2021-10-01 CEST"
>
>So, generally, you cannot simply change the time zone in POSIXlt - don't pretend the time is in UTC if it's not, you have to re-parse or re-compute the timestamps for it to be reliable or else the DST flag will be wrong.
>
>Cheers,
>Simon
>
>
>> On 10/10/2022, at 1:14 AM, Alexandre Courtiol <alexandre.courtiol using gmail.com> wrote:
>> 
>> Hi R pkg developers,
>> 
>> We are facing a datetime handling issue which manifests itself in a
>> package we are working on.
>> 
>> In context, we noticed that reading datetime info from an excel file
>> resulted in different data depending on the computer we used.
>> 
>> We are aware that timezone and regional settings are general sources
>> of troubles, but the code we are using was trying to circumvent this.
>> We went only as far as figuring out that the issue happens when
>> converting a POSIXlt into a POSIXct.
>> 
>> Please find below, a minimal reproducible example where `foo` is
>> converted to `bar` on two different computers.
>> `foo` is a POSIXlt with a defined time zone and upon conversion to a
>> POSIXct, despite using a set time zone, we end up with `bar` being
>> different on Linux and on a Windows machine.
>> 
>> We noticed that the difference emerges from the system call
>> `.Internal(as.POSIXct())` within `as.POSIXct.POSIXlt()`.
>> We also noticed that the internal function in R actually calls
>> getenv("TZ") within C, which is probably what explains where the
>> difference comes from.
>> 
>> Such a behaviour is probably expected and not a bug, but what would be
>> the strategy to convert a POSIXlt into a POSIXct that would not be
>> machine dependent?
>> 
>> We finally noticed that depending on the datetime used as a starting
>> point and on the time zone used when calling `as.POSIXct()`, we
>> sometimes have a difference between computers and sometimes not...
>> which adds to our puzzlement.
>> 
>> Many thanks.
>> Alex & Liam
>> 
>> 
>> ``` r
>> ## On Linux
>> foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
>> 9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
>>                 class = c("POSIXlt", "POSIXt"), tzone = "UTC")
>> 
>> bar <- as.POSIXct(foo, tz = "Europe/Berlin")
>> 
>> bar
>> #> [1] "2021-10-01 01:00:00 CEST"
>> 
>> dput(bar)
>> #> structure(1633042800, class = c("POSIXct", "POSIXt"), tzone =
>> "Europe/Berlin")
>> ```
>> 
>> ``` r
>> ## On Windows
>> foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
>> 9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
>>                 class = c("POSIXlt", "POSIXt"), tzone = "UTC")
>> 
>> bar <- as.POSIXct(foo, tz = "Europe/Berlin")
>> 
>> bar
>> #> [1] "2021-10-01 CEST"
>> 
>> dput(bar)
>> structure(1633046400, class = c("POSIXct", "POSIXt"), tzone = "Europe/Berlin")
>> ```
>> 
>> -- 
>> Alexandre Courtiol, www.datazoogang.de
>> 
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> 
>
>______________________________________________
>R-package-devel using r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-package-devel

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-package-devel mailing list