[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time
Tim Taylor
t|m@t@y|or @end|ng |rom h|ddene|eph@nt@@co@uk
Tue Aug 15 12:00:24 CEST 2023
Many thanks Martin!
I was completely overlooking the behaviour for a length 1 vector with
00:00:00. More coffee needed for me I think.
Best
Tim
On 15/08/2023 08:58, Martin Maechler wrote:
>>>>>> Tim Taylor
>>>>>> on Mon, 14 Aug 2023 12:26:51 +0100 writes:
> > Martin,
> > Thank you. Everything you have written is helpful and I admit I am likely guilty of using as.character() instead of format() in the past().
>
> > Ignoring the above though, one thing I’m still unclear on is the special handling of zero (or rather non-zero time) seconds in the method. Is the motivation that as.character() outputs the minimum necessary information? It is clearly a very deliberate choice but the reasoning is still going a little over my head.
>
> > Best
> > Tim
>
> Hmm, I really don't understand what you don't understand.
> Here's some annotated R code exemplifying that indeed now,
> as.character(x)[j] === as.character(x[j])
> but previously that was not fulfilled {when as.character() was
> the same as format() for POSIXct or POSIXlt}:
>
> ##-----------------------------------------------------------------------------
> x0 <- c("1975-01-01 00:00:00", "1975-01-01 15:27:00")
> t0 <- as.POSIXct(x0)
> str(t0) # POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
> t0 # "1975-01-01 00:00:00 CET" "1975-01-01 15:27:00 CET"
> t0[1] # "1975-01-01 CET" <-- yes, *no* 00:00:00 in no version of R
>
> ## In R <= 4.2.x as.character() was using format() for POSIX{ct,lt} :
> as.character(t0) # "1975-01-01 00:00:00" "1975-01-01 15:27:00" << for R <= 4.2.x
> as.character(t0) # "1975-01-01" "1975-01-01 15:27:00" << for R >= 4.3.0
> as.character(t0[1]) # "1975-01-01" {in all versions of R}
>
>
> Note that indeed as.character() does drop redundant trailing 0s :
>
> > as.character(c(0.5, 0.75, pi))
> [1] "0.5" "0.75" "3.14159265358979"
>
> whereas format() does not (ensuring resulting strings of the same nchar(.)):
>
> > format( c(0.5, 0.75, pi))
> [1] "0.500000" "0.750000" "3.141593"
>
>
>
> >> On 14 Aug 2023, at 09:52, Martin Maechler <maechler using stat.math.ethz.ch> wrote:
> >>
> >>
> >>>
> >>>>>>> Andy Teucher
> >>>>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes:
> >>
> >>> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended?
> >>
> >> Well, as the NEWS entry says
> >> (partly visible in the url above -- which only shows one part of
> >> the several changes for R 4.3) :
> >>
> >> • as.character(<POSIXt>) now behaves more in line with the methods
> >> for atomic vectors such as numbers, and is no longer influenced
> >> by options(). Ditto for as.character(<Date>). The
> >> as.character() method gets arguments digits and OutDec with
> >> defaults _not_ depending on options(). Use of as.character(*,
> >> format = .) now warns.
> >>
> >> It was "inconsistent" to have as.character(.) basically use format(.) for
> >> these datatime objects.
> >> as.character(x) for basic R types such as numbers, strings, logicals,...
> >> fulfills the important property
> >>
> >> as.character(x)[j] === as.character(x[j])
> >>
> >> whereas that is very much different for format() where indeed,
> >> the formatting of x[1] may quite a bit depend on the other
> >> x[j]'s values:
> >>
> >>> as.character(c(1, pi, pi/2^20))
> >> [1] "1" "3.14159265358979" "2.99605622633914e-06"
> >>
> >>> format(c(1, pi, pi/2^20))
> >> [1] "1.000000e+00" "3.141593e+00" "2.996056e-06"
> >>> format(c(1, pi))
> >> [1] "1.000000" "3.141593"
> >>> format(c(1, 10))
> >> [1] " 1" "10"
> >>>
> >>
> >>
> >>> When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained:
> >>
> >>> In R 4.2.3:
> >>
> >>> ```
> >>> R.version$version.string
> >>> #> [1] "R version 4.2.3 (2023-03-15)"
> >>
> >>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
> >>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST”
> >>
> >>> (tc <- as.character(t))
> >>> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00”
> >>> ```
> >>
> >>> In R 4.3.1:
> >>
> >>> ```
> >>> R.version$version.string
> >>> #> [1] "R version 4.3.1 (2023-06-16)"
> >>
> >>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
> >>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST”
> >>
> >>> (tc <- as.character(t))
> >>> #> [1] "1975-01-01" "1975-01-01 15:27:00”
> >>> ```
> >>
> >> You should have used format() here or at least should do so now.
> >>
> >>> This has consequences when round-tripping from POSIXt ->
> >>> character -> POSIXt,
> >>
> >> Well, I'd argue that such a "round trip" is not a "good idea"
> >> anyway, as there are quite a few platform (local timezone for
> >> one) issues, and precision is lost, notably for POSIXlt which
> >> may be more precise than you typically get, etc.
> >>
> >>> since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component:
> >>
> >> Well, there *is* no as.POSIXct.character() {but we understand what you mean}:
> >> If you look at the help page you'd see that there's as.POSIXlt.character()
> >> {which is called from as.POSIXct.default()}
> >> with a 3rd argument 'format' and a 4th argument 'tryFormats'
> >> {and a lot more information -- the whole topic is far from trivial}.
> >>
> >> Now, indirectly you would want R to be "smart", i.e. the
> >> as.POSIXlt.character() method "guess better" about what the
> >> user wants. ...
> >> ... and I agree that is not an unreasonable expectation, e.g.,
> >> for your example of wanting
> >>
> >> c("1975-01-01", "1975-01-01 15:27:00")
> >>
> >> to "work".
> >>
> >> as.POSIXlt.character() is well documented to be trying all of
> >> the `tryFormats` in order, until it finds one that works for all
> >> vector components (or fail / use NA if none works);
> >> and here it's only a format which drops the time that works for
> >> all (i.e. both, in the example).
> >>
> >> { Even though its behavior is well documented,
> >> one could even argue that by default you'd want a warning in
> >> such a case where "so much" is lost.
> >> I think however that introducing such a warning may trip too
> >> much current code relying .. also, the extra *checking* maybe
> >> somewhat costly .. (?) .... anyway that's an interesting side topic
> >> }
> >>
> >> Instead what you want here is for each string (element of the
> >> character vector) to try the `tryFormats and using the best
> >> available *individually* {smart R users ==> "think lapply(.)"} :
> >> Currently, this would be "something like" unlist(lapply(x, as.POSIXlt))
> >> well, and then you need to jump a hoop additionally.
> >> If you want POSIXct, like this :
> >>
> >> .POSIXct(unlist(lapply( * , as.POSIXct))))
> >>
> >> For your example
> >>
> >> ch <- c("1975-01-01", "1975-01-01 15:27:00")
> >>
> >>> str(.POSIXct(unlist(lapply(ch, as.POSIXct))))
> >> POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
> >>
> >> ---
> >>
> >> After all that, yes, I agree that we should consider making
> >> this much easier. E.g., by adding an optional argument to
> >> as.POSIXlt.character() say, `each` with default FALSE such
> >> that as.POSIXlt(*, each=TRUE)
> >> {and also as.POSIXct(*, each=TRUE) } would follow the above
> >> strategy.
> >>
> >> ?
> >>
> >> Martin
> >>
> >> --
> >> Martin Maechler
> >> ETH Zurich and R Core tam
> >>
> >> ______________________________________________
> >> R-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list