[Rd] Bug with `[<-.POSIXlt` on specific OSes

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Wed Oct 12 12:47:50 CEST 2022


>>>>> Martin Maechler 
>>>>>     on Wed, 12 Oct 2022 10:17:28 +0200 writes:

>>>>> Kurt Hornik 
>>>>>     on Tue, 11 Oct 2022 16:44:13 +0200 writes:

>>>>> Davis Vaughan writes:
    >>> I've got a bit more information about this one. It seems like it
    >>> (only? not sure) appears when `TZ = "UTC"`, which is why I didn't see
    >>> it before on my Mac, which defaults to `TZ = ""`. I think this is at
    >>> least explainable by the fact that those "optional" fields aren't
    >>> technically needed when the time zone is UTC.

    >> Exactly.  Debugging `[<-.POSIlt` with

    >> x <- as.POSIXlt(as.POSIXct("2013-01-31", tz = "America/Chicago"))
    >> Sys.setenv(TZ = "UTC")
    >> x[1] <- NA

    >> shows we get into

    >> value <- unclass(as.POSIXlt(value))
    >> if (ici) {
    >> for (n in names(x)) names(x[[n]]) <- nms
    >> }
    >> for (n in names(x)) x[[n]][i] <- value[[n]]

    >> where

    >> Browse[2]> names(value)
    >> [1] "sec"   "min"   "hour"  "mday"  "mon"   "year"  "wday"  "yday"  "isdst"
    >> Browse[2]> names(x)
    >> [1] "sec"    "min"    "hour"   "mday"   "mon"    "year"   "wday"   "yday"  
    >> [9] "isdst"  "zone"   "gmtoff"

    >> Without having looked at the code, the docs say

    >> ‘zone’ (Optional.) The abbreviation for the time zone in force at
    >> that time: ‘""’ if unknown (but ‘""’ might also be used for
    >> UTC).

    >> ‘gmtoff’ (Optional.) The offset in seconds from GMT: positive
    >> values are East of the meridian.  Usually ‘NA’ if unknown,
    >> but ‘0’ could mean unknown.

    >> so perhaps we should fill with the values for the unknown case?

    >> -k

    > Well,

    > I think you both know  I'm in the midst of dealing with these
    > issues, to fix both

    > [.POSIXlt  and
    > [<-.POSIXlt

    > Yes, one needs a way to not only "fill" the partially filled
    > entries but also to *normalize* out-of-range values
    > (say negative seconds, minutes > 60, etc)

    > All this is available in our C code, but not on the R level,
    > so yesterday, I wrote a C function to be called via .Internal(.)
    > from a new R that provides this.

    > Provisionally called

    > balancePOSIXlt()

    > because it both balances the 9 to 11 list-components of POSIXlt
    > and it also puts all numbers of (sec, min, hour, mday, mon)
    > into a correct range (and also computes correctl wday and yday numbers).
    > but I'm happy for proposals of better names.
    > I had contemplated  validatePOSIXlt() as alternative, but then
    > dismissed that as in some sense we now do agree that
    > "imbalanced" POSIXlt's are not really invalid ..

    > .. and yes, to Davis:  Even though I've spent so many hours with
    > POSIXlt, POSIXct and Date during the last week, I'm still
    > surprised more often than I like by the effects of timezone
    > settings there.

    > Martin

I have committed the new R and C code now, defining  balancePOSIXlt(),
to get feedback from the community.

I've extended the documentation in  help(DateTimeClasses),
and notably factored out the description
of  POSIXlt  mentioning the  "ragged" and "out-of-range" cases.

This needs more testing and experiments, and I have not
announced it  NEWS  yet.

Planned next is to use it in  [.POSIXlt and [<-.POSIXlt
so they will work correctly.

But please share your thoughts, propositions, ...

Martin


    >>> I can reproduce this now on my personal Mac:

    >>> ```

    >>> x <- as.POSIXlt(as.POSIXct("2013-01-31", tz = "America/Chicago"))

    >>> Sys.setenv(TZ = "")

    >>> x[1] <- NA

    >>> x

    >>> #> [1] NA


    >>> x <- as.POSIXlt(as.POSIXct("2013-01-31", tz = "America/Chicago"))

    >>> Sys.setenv(TZ = "America/New_York")

    >>> x[1] <- NA

    >>> x

    >>> #> [1] NA


    >>> x <- as.POSIXlt(as.POSIXct("2013-01-31", tz = "America/Chicago"))

    >>> Sys.setenv(TZ = "UTC")

    >>> x[1] <- NA
    >>> #> Error in x[[n]][i] <- value[[n]] : replacement has length zero

    >>> x

    >>> #> [1] "2013-01-31 CST"
    >>> ```

    >>> Here are `sessionInfo()` and `Sys.getenv("TZ")` outputs for 3 GitHub
    >>> Actions platforms where the bug exists (note they all set `TZ = "UTC"`!):

    >>> Linux:

    >>> ```

    >>>> sessionInfo()

    >>> R version 4.2.1 (2022-06-23)

    >>> Platform: x86_64-pc-linux-gnu (64-bit)

    >>> Running under: Ubuntu 18.04.6 LTS


    >>> Matrix products: default

    >>> BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3

    >>> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so


    >>> locale:

    >>> [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8

    >>> [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8

    >>> [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C

    >>> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C


    >>> attached base packages:

    >>> [1] stats     graphics  grDevices utils     datasets  methods   base


    >>> loaded via a namespace (and not attached):

    >>> [1] compiler_4.2.1


    >>>> Sys.getenv("TZ")

    >>> [1] "UTC"
    >>> ```

    >>> Mac:

    >>> ```

    >>>> sessionInfo()

    >>> R version 4.2.1 (2022-06-23)

    >>> Platform: x86_64-apple-darwin17.0 (64-bit)

    >>> Running under: macOS Big Sur ... 10.16


    >>> Matrix products: default

    >>> BLAS:
    >>> /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib

    >>> LAPACK:
    >>> /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib


    >>> locale:

    >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


    >>> attached base packages:

    >>> [1] stats     graphics  grDevices utils     datasets  methods   base


    >>> loaded via a namespace (and not attached):

    >>> [1] compiler_4.2.1


    >>>> Sys.getenv("TZ")

    >>> [1] "UTC"
    >>> ```

    >>> Windows:
    >>> This is the best I can get you, sorry (remote worker issues), but note that
    >>> it does also say `tz UTC` like the others.

    >>> ```
    >>> version R version 4.2.1 (2022-06-23 ucrt)
    >>> os Windows Server x64 (build 20348)
    >>> system x86_64, mingw32
    >>> ui RTerm
    >>> language (EN)
    >>> collate English_United States.utf8
    >>> ctype English_United States.utf8
    >>> tz UTC
    >>> date 2022-10-11
    >>> ```

    >>> And here is my Mac where the bug doesn't show up by default because `TZ =
    >>> ""`:

    >>> ```

    >>>> sessionInfo()

    >>> R version 4.2.1 (2022-06-23)

    >>> Platform: x86_64-apple-darwin17.0 (64-bit)

    >>> Running under: macOS Big Sur ... 10.16


    >>> Matrix products: default

    >>> BLAS:
    >>> /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib

    >>> LAPACK:
    >>> /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib


    >>> locale:

    >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


    >>> attached base packages:

    >>> [1] stats     graphics  grDevices utils     datasets  methods   base


    >>> loaded via a namespace (and not attached):

    >>> [1] compiler_4.2.1


    >>>> Sys.getenv("TZ")

    >>> [1] ""


    >>>> Sys.timezone()

    >>> [1] "America/New_York"
    >>> ```

    >>> -Davis


    >>> On Thu, Oct 6, 2022 at 9:33 AM Davis Vaughan <davis using rstudio.com> wrote:

    >>>> Hi all,
    >>>> 
    >>>> I have found another POSIXlt bug while I've been fiddling around with it.
    >>>> This one only appears on specific OSes, because it has to do with the fact
    >>>> that the `gmtoff` field is optional, and isn't always used on all OSes. It
    >>>> also doesn't seem to be specific to r-devel, I think it has been there
    >>>> awhile.
    >>>> 
    >>>> Here is the bug:
    >>>> 
    >>>> ```
    >>>> x <- as.POSIXlt(as.POSIXct("2013-01-31", tz = "America/Chicago"))
    >>>> 
    >>>> # Oh no!
    >>>> x[1] <- NA
    >>>> #> Error in x[[n]][i] <- value[[n]] : replacement has length zero
    >>>> ```
    >>>> 
    >>>> If you look at the objects, you can see that `x` has a `gmtoff` field, but
    >>>> `NA` (when converted to POSIXlt, which is what `[<-.POSIXlt` does) does not:
    >>>> 
    >>>> ```
    >>>> unclass(x)
    >>>> #> $sec
    >>>> #> [1] 0
    >>>> #>
    >>>> #> $min
    >>>> #> [1] 0
    >>>> #>
    >>>> #> $hour
    >>>> #> [1] 0
    >>>> #>
    >>>> #> $mday
    >>>> #> [1] 31
    >>>> #>
    >>>> #> $mon
    >>>> #> [1] 0
    >>>> #>
    >>>> #> $year
    >>>> #> [1] 113
    >>>> #>
    >>>> #> $wday
    >>>> #> [1] 4
    >>>> #>
    >>>> #> $yday
    >>>> #> [1] 30
    >>>> #>
    >>>> #> $isdst
    >>>> #> [1] 0
    >>>> #>
    >>>> #> $zone
    >>>> #> [1] "CST"
    >>>> #>
    >>>> #> $gmtoff
    >>>> #> [1] -21600
    >>>> #>
    >>>> #> attr(,"tzone")
    >>>> #> [1] "America/Chicago" "CST"             "CDT"
    >>>> 
    >>>> unclass(as.POSIXlt(NA))
    >>>> #> $sec
    >>>> #> [1] NA
    >>>> #>
    >>>> #> $min
    >>>> #> [1] NA
    >>>> #>
    >>>> #> $hour
    >>>> #> [1] NA
    >>>> #>
    >>>> #> $mday
    >>>> #> [1] NA
    >>>> #>
    >>>> #> $mon
    >>>> #> [1] NA
    >>>> #>
    >>>> #> $year
    >>>> #> [1] NA
    >>>> #>
    >>>> #> $wday
    >>>> #> [1] NA
    >>>> #>
    >>>> #> $yday
    >>>> #> [1] NA
    >>>> #>
    >>>> #> $isdst
    >>>> #> [1] -1
    >>>> #>
    >>>> #> attr(,"tzone")
    >>>> #> [1] "UTC"
    >>>> ```
    >>>> 
    >>>> The problem seems to be that `[<-.POSIXlt` assumes that if the field was
    >>>> there in `x` then it must also be there in `value`:
    >>>> 
    >>>> https://github.com/wch/r-source/blob/e10a971dee6a0ab851279c183cc21954d66b3be4/src/library/base/R/datetime.R#L1303-L1304
    >>>> 
    >>>> But this isn't the case for the `NA` value that was converted to POSIXlt.
    >>>> 
    >>>> I can't reproduce this on my personal Mac, but it affects the Linux, Mac,
    >>>> and Windows machines we use for the lubridate CI checks through GitHub
    >>>> Actions.
    >>>> 
    >>>> Thanks,
    >>>> Davis
    >>>> 

    > ______________________________________________
    > R-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list