[Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time
Tim Taylor
t|m@t@y|or @end|ng |rom h|ddene|eph@nt@@co@uk
Mon Aug 14 13:26:51 CEST 2023
Martin,
Thank you. Everything you have written is helpful and I admit I am likely guilty of using as.character() instead of format() in the past().
Ignoring the above though, one thing I’m still unclear on is the special handling of zero (or rather non-zero time) seconds in the method. Is the motivation that as.character() outputs the minimum necessary information? It is clearly a very deliberate choice but the reasoning is still going a little over my head.
Best
Tim
> On 14 Aug 2023, at 09:52, Martin Maechler <maechler using stat.math.ethz.ch> wrote:
>
>
>>
>>>>>> Andy Teucher
>>>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes:
>
>> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended?
>
> Well, as the NEWS entry says
> (partly visible in the url above -- which only shows one part of
> the several changes for R 4.3) :
>
> • as.character(<POSIXt>) now behaves more in line with the methods
> for atomic vectors such as numbers, and is no longer influenced
> by options(). Ditto for as.character(<Date>). The
> as.character() method gets arguments digits and OutDec with
> defaults _not_ depending on options(). Use of as.character(*,
> format = .) now warns.
>
> It was "inconsistent" to have as.character(.) basically use format(.) for
> these datatime objects.
> as.character(x) for basic R types such as numbers, strings, logicals,...
> fulfills the important property
>
> as.character(x)[j] === as.character(x[j])
>
> whereas that is very much different for format() where indeed,
> the formatting of x[1] may quite a bit depend on the other
> x[j]'s values:
>
>> as.character(c(1, pi, pi/2^20))
> [1] "1" "3.14159265358979" "2.99605622633914e-06"
>
>> format(c(1, pi, pi/2^20))
> [1] "1.000000e+00" "3.141593e+00" "2.996056e-06"
>> format(c(1, pi))
> [1] "1.000000" "3.141593"
>> format(c(1, 10))
> [1] " 1" "10"
>>
>
>
>> When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained:
>
>> In R 4.2.3:
>
>> ```
>> R.version$version.string
>> #> [1] "R version 4.2.3 (2023-03-15)"
>
>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST”
>
>> (tc <- as.character(t))
>> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00”
>> ```
>
>> In R 4.3.1:
>
>> ```
>> R.version$version.string
>> #> [1] "R version 4.3.1 (2023-06-16)"
>
>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST”
>
>> (tc <- as.character(t))
>> #> [1] "1975-01-01" "1975-01-01 15:27:00”
>> ```
>
> You should have used format() here or at least should do so now.
>
>> This has consequences when round-tripping from POSIXt ->
>> character -> POSIXt,
>
> Well, I'd argue that such a "round trip" is not a "good idea"
> anyway, as there are quite a few platform (local timezone for
> one) issues, and precision is lost, notably for POSIXlt which
> may be more precise than you typically get, etc.
>
>> since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component:
>
> Well, there *is* no as.POSIXct.character() {but we understand what you mean}:
> If you look at the help page you'd see that there's as.POSIXlt.character()
> {which is called from as.POSIXct.default()}
> with a 3rd argument 'format' and a 4th argument 'tryFormats'
> {and a lot more information -- the whole topic is far from trivial}.
>
> Now, indirectly you would want R to be "smart", i.e. the
> as.POSIXlt.character() method "guess better" about what the
> user wants. ...
> ... and I agree that is not an unreasonable expectation, e.g.,
> for your example of wanting
>
> c("1975-01-01", "1975-01-01 15:27:00")
>
> to "work".
>
> as.POSIXlt.character() is well documented to be trying all of
> the `tryFormats` in order, until it finds one that works for all
> vector components (or fail / use NA if none works);
> and here it's only a format which drops the time that works for
> all (i.e. both, in the example).
>
> { Even though its behavior is well documented,
> one could even argue that by default you'd want a warning in
> such a case where "so much" is lost.
> I think however that introducing such a warning may trip too
> much current code relying .. also, the extra *checking* maybe
> somewhat costly .. (?) .... anyway that's an interesting side topic
> }
>
> Instead what you want here is for each string (element of the
> character vector) to try the `tryFormats and using the best
> available *individually* {smart R users ==> "think lapply(.)"} :
> Currently, this would be "something like" unlist(lapply(x, as.POSIXlt))
> well, and then you need to jump a hoop additionally.
> If you want POSIXct, like this :
>
> .POSIXct(unlist(lapply( * , as.POSIXct))))
>
> For your example
>
> ch <- c("1975-01-01", "1975-01-01 15:27:00")
>
>> str(.POSIXct(unlist(lapply(ch, as.POSIXct))))
> POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
>
> ---
>
> After all that, yes, I agree that we should consider making
> this much easier. E.g., by adding an optional argument to
> as.POSIXlt.character() say, `each` with default FALSE such
> that as.POSIXlt(*, each=TRUE)
> {and also as.POSIXct(*, each=TRUE) } would follow the above
> strategy.
>
> ?
>
> Martin
>
> --
> Martin Maechler
> ETH Zurich and R Core tam
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list