[Rd] as.character.POSIXt in R devel

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Oct 3 18:58:48 CEST 2022


>>>>> Martin Maechler 
>>>>>     on Mon, 3 Oct 2022 14:46:08 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>     on Sun, 2 Oct 2022 08:42:50 +0000 (UTC) writes:

    >> With r82904, 'as.character.POSIXt' in R devel is changed. The NEWS item:

    >> as.character(<POSIXt>) now behaves more in line with the
    >> methods for atomic vectors such as numbers, and is no longer
    >> influenced by options().

 [..............]

    >> * Wrong:

    >> The result is wrong when as.character(fs[n0]) has scientific notation.

    > yes, you are right.  This is a lapsus I will fix.

    >> Example (modified from https://bugs.r-project.org/show_bug.cgi?id=9819):
    >> op <- options(scipen = 0, OutDec = ".") # (default setting)
    >> x <- as.POSIXlt("2007-07-27 16:11:03.000002")
    >> as.character(x)
    >> # "2007-07-27 16:11:03.99999999983547e-06"
    >> as.character(x$sec - trunc(x$sec))
    >> # "1.99999999983547e-06"
    >> options(op)

    >> 'as.character.POSIXt' could temporarily set option 'scipen' large enough to prevent scientific notation in as.character(fs[n0]) .

    > Yes, something like that.

I have committed a version now of datetime.R,  svn rev 83010 ,
which does no longer depend on  'OutDec' (but gets such argument)
and which has a new 'digits' argument which defaults
to 14 for POSIXlt and
to  6 for POSIXct  .. but the user can choose a different value.

Also, it now uses the equivalent of  as.character(round(x$sec, digits))
(in case the seconds need to be shown)  which also solves the
following  "too much precision"  problem.

    >> * Too much precision:

    >> In some cases with fractional seconds with seconds close to 60, the result has many decimal places while there is an accurate representation with less decimal places. It is actually OK, just unpleasant.

    > I agree that is unpleasant.
    > To someone else I had written that we also may need to improve
    > the number of decimals shown here.
    > The design has been that it should be "full precision"
    > as it is for  as.character(<numbers>)

    > Now, we know that POSIXct cannot be very precise (in its
    > fractional seconds) but that is very different for POSIXlt where
    > fractional seconds may have 14 digits after the decimal point.

    > Ideally we could *store* with the POSIXlt object if it was
    > produced from a POSIXct one, and hence have only around 6 valid digits
    > (after the dec.) or not.  As we cannot currently store/save that
    > info, we kept using "full" precision which may be much more than
    > is sensible.

    >> Example (modified from https://bugs.r-project.org/show_bug.cgi?id=14693):
    >> op <- options(scipen = 0, OutDec = ".") # (default setting)
    >> x <- as.POSIXlt("2011-10-01 12:34:56.3")
    >> x$sec == 56.3 # TRUE

    > [which may be typical, but may also be platform dependent]

    >> print(x$sec, 17)
    >> # [1] 56.299999999999997
    >> as.character(x)
    >> # "2011-10-01 12:34:56.299999999999997"
    >> format(x, "%Y-%m-%d %H:%M:%OS1") # short and accurate
    >> # "2011-10-01 12:34:56.3"
    >> ct <- as.POSIXct(x, tz = "UTC")
    >> identical(ct,
    >> as.POSIXct("2011-10-01 12:34:56.3", tz = "UTC"))
    >> # TRUE
    >> print(as.numeric(ct), 17)
    >> # [1] 1317472496.3
    >> lct <- as.POSIXlt(ct)
    >> lct$sec == 56.3 # FALSE
    >> print(lct$sec, 17)
    >> # [1] 56.299999952316284
    >> as.character(ct)
    >> # "2011-10-01 12:34:56.299999952316284"
    >> options(op)

    >> The "POSIXct" case is a little different because some precision is already lost after converted to "POSIXct".

    > yes, indeed.

    >> In 'as.character.POSIXt', using 'as.character' on the seconds (not separating the fractional part) might be good enough, but a leading zero must be added as necessary.

    > I think you are right: that may definitely better...

indeed; part of my commit.

    >> * Different from 'format':

    >> - With fractional seconds, the result is influenced by option 'OutDec'.

this has been solved, too.
For the "freaks" allowing an explicit  'OutDec = *' argument
but *not* with default depending on options()!


    > Thank you.  I was not aware of that.
    > The reason "of course" being that  as.character(<numeric>)  is
    > *also* depending on option  OutDec.

    > I would say that is clearly wrong...  and I think we should
    > strongl consider to change that:

    > 'OutDec' should influence print()ing and format()ing  but should
    > *not* influence  as.character()  at least not for basic R types/objects.


    >> - From "Printing years" in ?strptime: "For years 0 to 999 most OSes pad with zeros or spaces to 4 characters, and Linux outputs just the number."
    >> Because (1900 + x$year) is formatted with %d in 'as.character.POSIXt', years 0 to 999 is output without padding. It is different from 'format' in OSes other than Linux.

    > Good point.  This should be  amended.

Not yet.  Actually, I'm no longer sure this needs any action.
I find it somewhat natural that

> (CharleMagne.crowned <- as.POSIXlt(ISOdate(774,7,10)))
[1] "774-07-10 12:00:00 GMT"
> as.character(CharleMagne.crowned)
[1] "774-07-10 12:00:00"



    >> * Behavior with "improper" "POSIXlt" object:

    >> - "POSIXlt" object with out-of-bounds components is not normalized.

    >> Example (modified from regr.tests-1d.R):
    >> op <- options(scipen = 0) # (default setting)
    >> x <- structure(
    >> list(sec = 10000, min = 59L, hour = 18L,
    >> mday = 6L, mon = 11L, year = 116L,
    >> wday = 2L, yday = 340L,
    >> isdst = 0L, zone = "CET", gmtoff = 3600L),
    >> class = c("POSIXlt", "POSIXt"), tzone = "CET")
    >> as.character(x)
    >> # "2016-12-06 18:59:10000"
    >> format(x)
    >> # "2016-12-06 21:45:40"
    >> options(op)


    > Yes, we knew that  and were not too happy about it, but also not
    > too unhappy:
    > After all,		    help(DateTimeClasses)
    > clearly explains how
    > POSIXlt objects should look like :

    > -------------------------------------------------------------------
    > Class ‘"POSIXlt"’ is a named list of vectors representing

    > ‘sec’ 0-61: seconds.
    > ‘min’ 0-59: minutes.
    > ‘hour’ 0-23: hours.
    > ‘mday’ 1-31: day of the month
    > ‘mon’ 0-11: months after the first of the year.
    > ‘year’ years since 1900.
    > ‘wday’ 0-6 day of the week, starting on Sunday.
    > ‘yday’ 0-365: day of the year (365 only in leap years).

    > ‘isdst’ Daylight Saving Time ... ... ...
    > ................................
    > ................................

    > -------------------------------------------------------------------

    > We have been aware that as.character() assumes the above specification,
    > even though other R functions, notably format() which uses
    > internal (C level; either system (OS) or R's own) strptime() do
    > arithmetic (modulo 60, then modulo 24, then modulo month length)
    > to compute the date "used".

    > Allowing such  "un-normalized" / out-of-bound  POSIXlt objects
    > in R has not been documented AFAICS, and has the consequence
    > that two different POSIXlt objects may correspond to the exact
    > same time. 

    > This may be something worth discussing.
    > In some sense we are discussing how the "POSIXlt" class is defined
    > (even though an S3 class is never formally defined).

(nothing changed here)


    >> - With "POSIXlt" object where sec, min, hour, mday, mon,
    >> and year components are not all of the same length, recycling is not handled.

This is still the case... (see below).

    > Good point.  I tend to agree that this should be improved *and* also
    > documented: AFAIK, it is also not at all documented  (or is it ??)
    > that the POSIXlt components should be thought to be recycling.

    > If we decide we want that, 
    > once this is documented (and all methods/functions tested with
    > such POSIXlt) it could also be used to use considerably smaller size
    > POSIXlt objects, e.g, when all parts are in the same year, or
    > when all seconds are 0, or ...

    >> Example (modified from regr.tests-1d.R):
    >> op <- options(scipen = 0) # (default setting)
    >> x <- structure(
    >> list(sec = c(1,  2), min = 59L, hour = 18L,
    >> mday = 6L, mon = 11L, year = 116L,
    >> wday = 2L, yday = 340L,
    >> isdst = 0L, zone = "CET", gmtoff = 3600L),
    >> class = c("POSIXlt", "POSIXt"), tzone = "CET")
    >> as.character(x)
    >> # c("2016-12-06 18:59:01", "NA NA:NA:02")
    >> format(x)
    >> # c("2016-12-06 18:59:01", "2016-12-06 18:59:02")
    >> options(op)

Note that currently such {needing recycling} - cases are
*also* not handled by the simple  (and important!)  length.POSIXlt()
method, either:  It currently only looks at the '.$sec'
component !

So this case does need discussion two.
I think it's unfortunate that *some* *.POSIXt methods do such
recycling, e.g. format.POSIXt,
but others do not {and the documentation does not even mention recycling}.

As mentioned, I am *pro* going in that direction;
so I would change

  length.POSIXlt <- function(x) length(unclass(x)[[1L]])

       (which only uses x$sec !)

to

  length.POSIXlt <- function(x) max(lengths(unclass(x), use.names=FALSE))

not allowing 0-length recycling; 0-lengths components
should really be illegal in an otherwise non-0-length POSIXlt x


Martin




More information about the R-devel mailing list