[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

Tim Taylor t|m@t@y|or @end|ng |rom h|ddene|eph@nt@@co@uk
Tue Aug 15 12:00:24 CEST 2023


Many thanks Martin!

I was completely overlooking the behaviour for a length 1 vector with 
00:00:00. More coffee needed for me I think.

Best

Tim


On 15/08/2023 08:58, Martin Maechler wrote:
>>>>>> Tim Taylor
>>>>>>      on Mon, 14 Aug 2023 12:26:51 +0100 writes:
>      > Martin,
>      > Thank you. Everything you have written is helpful and I admit I am likely guilty of using as.character() instead of format() in the past().
>
>      > Ignoring the above though, one thing I’m still unclear on is the special handling of zero (or rather non-zero time) seconds in the method. Is the motivation that as.character() outputs the minimum necessary information? It is clearly a very deliberate choice but the reasoning is still going a little over my head.
>
>      > Best
>      > Tim
>
> Hmm, I really don't understand what you don't understand.
> Here's some annotated R code exemplifying that indeed now,
>      as.character(x)[j] === as.character(x[j])
> but previously that was not fulfilled  {when  as.character() was
> the same as format() for POSIXct or POSIXlt}:
>
> ##-----------------------------------------------------------------------------
> x0 <- c("1975-01-01 00:00:00", "1975-01-01 15:27:00")
> t0 <- as.POSIXct(x0)
> str(t0) #  POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
> t0    #  "1975-01-01 00:00:00 CET" "1975-01-01 15:27:00 CET"
> t0[1] #  "1975-01-01 CET" <-- yes, *no* 00:00:00   in no version of R
>
> ## In R <= 4.2.x  as.character() was using format() for POSIX{ct,lt} :
> as.character(t0)    # "1975-01-01 00:00:00" "1975-01-01 15:27:00" << for R <= 4.2.x
> as.character(t0)    # "1975-01-01"          "1975-01-01 15:27:00" << for R >= 4.3.0
> as.character(t0[1]) # "1975-01-01"  {in all versions of R}
>
>
> Note that indeed   as.character()  does drop redundant trailing 0s :
>
>    > as.character(c(0.5, 0.75, pi))
>    [1] "0.5"              "0.75"             "3.14159265358979"
>
> whereas format() does not (ensuring resulting strings of the same nchar(.)):
>
>    > format(      c(0.5, 0.75, pi))
>    [1] "0.500000" "0.750000" "3.141593"
>
>
>
>      >> On 14 Aug 2023, at 09:52, Martin Maechler <maechler using stat.math.ethz.ch> wrote:
>      >>
>      >> 
>      >>>
>      >>>>>>> Andy Teucher
>      >>>>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes:
>      >>
>      >>> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended?
>      >>
>      >> Well, as the NEWS entry says
>      >> (partly visible in the url above -- which only shows one part of
>      >> the several changes for R 4.3) :
>      >>
>      >> • as.character(<POSIXt>) now behaves more in line with the methods
>      >> for atomic vectors such as numbers, and is no longer influenced
>      >> by options().  Ditto for as.character(<Date>).  The
>      >> as.character() method gets arguments digits and OutDec with
>      >> defaults _not_ depending on options().  Use of as.character(*,
>      >> format = .) now warns.
>      >>
>      >> It was "inconsistent" to have  as.character(.) basically use format(.) for
>      >> these datatime objects.
>      >> as.character(x) for basic R types such as numbers, strings, logicals,...
>      >> fulfills the important property
>      >>
>      >> as.character(x)[j] === as.character(x[j])
>      >>
>      >> whereas that is very much different for format() where indeed,
>      >> the formatting  of  x[1]  may quite a bit depend on the other
>      >> x[j]'s values:
>      >>
>      >>> as.character(c(1, pi, pi/2^20))
>      >> [1] "1"    "3.14159265358979"   "2.99605622633914e-06"
>      >>
>      >>> format(c(1, pi, pi/2^20))
>      >> [1] "1.000000e+00" "3.141593e+00" "2.996056e-06"
>      >>> format(c(1, pi))
>      >> [1] "1.000000" "3.141593"
>      >>> format(c(1, 10))
>      >> [1] " 1" "10"
>      >>>
>      >>
>      >>
>      >>> When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained:
>      >>
>      >>> In R 4.2.3:
>      >>
>      >>> ```
>      >>> R.version$version.string
>      >>> #> [1] "R version 4.2.3 (2023-03-15)"
>      >>
>      >>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
>      >>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST”
>      >>
>      >>> (tc <- as.character(t))
>      >>> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00”
>      >>> ```
>      >>
>      >>> In R 4.3.1:
>      >>
>      >>> ```
>      >>> R.version$version.string
>      >>> #> [1] "R version 4.3.1 (2023-06-16)"
>      >>
>      >>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
>      >>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST”
>      >>
>      >>> (tc <- as.character(t))
>      >>> #> [1] "1975-01-01" "1975-01-01 15:27:00”
>      >>> ```
>      >>
>      >> You should have used format()  here  or at least should do so now.
>      >>
>      >>> This has consequences when round-tripping from POSIXt ->
>      >>> character -> POSIXt,
>      >>
>      >> Well, I'd argue that such a "round trip" is not a "good idea"
>      >> anyway, as there are quite a few platform (local timezone for
>      >> one) issues, and precision is lost, notably for POSIXlt which
>      >> may be more precise than you typically get, etc.
>      >>
>      >>> since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component:
>      >>
>      >> Well, there *is* no as.POSIXct.character()  {but we understand what you mean}:
>      >> If you look at the help page you'd see that there's  as.POSIXlt.character()
>      >> {which is called from as.POSIXct.default()}
>      >> with a 3rd argument 'format' and a 4th argument 'tryFormats'
>      >> {and a lot more information -- the whole topic is far from trivial}.
>      >>
>      >> Now, indirectly you would want R to be "smart", i.e. the
>      >> as.POSIXlt.character() method "guess better" about what the
>      >> user wants. ...
>      >> ... and I agree that is not an unreasonable expectation, e.g.,
>      >> for your example of wanting
>      >>
>      >> c("1975-01-01", "1975-01-01 15:27:00")
>      >>
>      >> to  "work".
>      >>
>      >> as.POSIXlt.character() is well documented to be trying all of
>      >> the `tryFormats` in order, until it finds one that works for all
>      >> vector components (or fail / use NA if none works);
>      >> and here it's only a format which drops the time that works for
>      >> all (i.e. both, in the example).
>      >>
>      >> { Even though its behavior is well documented,
>      >> one could even argue that by default you'd want a warning in
>      >> such a case where "so much" is lost.
>      >> I think however that introducing such a warning  may trip too
>      >> much current code relying .. also, the extra *checking* maybe
>      >> somewhat costly .. (?)  .... anyway that's an interesting side topic
>      >> }
>      >>
>      >> Instead what you want here is for each string (element of the
>      >> character vector) to try the `tryFormats and using the best
>      >> available *individually*  {smart R users ==> "think lapply(.)"} :
>      >> Currently, this would be  "something like"  unlist(lapply(x, as.POSIXlt))
>      >> well, and then you need to jump a hoop additionally.
>      >> If you want POSIXct,  like this :
>      >>
>      >> .POSIXct(unlist(lapply( * , as.POSIXct))))
>      >>
>      >> For your example
>      >>
>      >> ch <- c("1975-01-01", "1975-01-01 15:27:00")
>      >>
>      >>> str(.POSIXct(unlist(lapply(ch, as.POSIXct))))
>      >> POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
>      >>
>      >> ---
>      >>
>      >> After all that, yes, I agree that we should consider making
>      >> this much easier. E.g.,  by adding an optional argument to
>      >> as.POSIXlt.character()   say, `each` with default FALSE such
>      >> that as.POSIXlt(*,  each=TRUE)
>      >> {and also as.POSIXct(*,  each=TRUE) } would follow the above
>      >> strategy.
>      >>
>      >> ?
>      >>
>      >> Martin
>      >>
>      >> --
>      >> Martin Maechler
>      >> ETH Zurich   and   R Core tam
>      >>
>      >> ______________________________________________
>      >> R-devel using r-project.org mailing list
>      >> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list