[Rd] Help useRs to use R's own Time/Date objects more efficiently

Abby Spurdle @purd|e@@ @end|ng |rom gm@||@com
Sun Apr 5 22:57:21 CEST 2020


I think POSIXct and POSIXlt are badly-chosen names.
The name "POSIX" implies UNIX.
(i.e. XYZix operating system is mostly POSIX compliant... Woo-Hoo!).
My assumption is that most people modelling industrial/econometric
data etc, or data imported from databases, don't want system
references everywhere.

Historically, I've use the principle that:
If programming language A uses functionality from programming language
B, then bindings should be as close as possible to whatever is in
programming language B. Any additional functionality in programming
language A, should be distinct from the bindings.
R hasn't done this here, where POSIX-bindings have added in additional
R functionality and semantics.
Possibly introducing problems at an early stage.

The help file entitled DateTimeClasses, only covers a small subset of
information on date and time classes, with no obvious information
about how to construct date and time objects, except for what's in the
examples. The Date class has a similar problem, omitting information
about how to construct Date objects.

The "convenience extraction functions" aren't necessarily convenient
because they return text rather than integers, requiring many users to
still use the POSIXlt class.

I don't think your example is simple.
And I suspect it may discourage some people from using base packages.
Having opposite effect to what's intended.

It's probably too late to change the functions, but here's what I would suggest:

(1) Create a top-level help page with a title like "Date and Time
Classes" to give a brief but general overview. This would mean the
existing DateTimeClasses would need a new title.
(2) Create a another function the same as as.POSIXlt, but with a more
user-friendly name, which would increase readability.
(3) If help files for describing classes are separate from the help
files for creating/coercing objects (e.g. Date vs as.Date), then I
think they should cross reference each other in the description field,
not just the details or seealso fields.
(4) Reference relevant extraction/formatting functions, in most
date/time help files, even if there's some (small) duplication in the
help files.
(5) Focus on keeping the examples simple rather than comprehensive.

Expanding on suggestion (4), if you read the help file for as.Date
(which seems like an obvious starting point, because that's where I
started reading...), there's no reference at all to getting the month,
or the day of the week, etc. To make it worse it doesn't mention
coercion to POSIXlt objects either (but does mention coercion from
POSIXlt to Date objects). This could give the wrong impression to many
readers...

In it's defense, it does link to Date, which links to weekdays, which
links to as.POSIXlt.

Of course the note and seealso fields are near the bottom, and there's
an implicit (possibly false) assumption that the reader will read all
the help file*s*, and follow the links at the bottom, at least three
times over.
And a new-ish R user is likely to have to read more than four help files.
Unless they Google it, read stack exchange, or read some fancy
(apparently modern) textbook on data science...

Reinforcing the need for the help files to be clear about what the
functions (collectively) can do and specifically what
extraction/formatting functionality is available...

My guess is the that most common tasks with date and time objects are:
(1) Reading a character vector representing dates/times.
(2) Formatting a date/time (i.e. Object to character vector, or
character vector to another character vector).
(3) Extracting information such as month, weekday, etc, either as an
integer or as text.

So, I in short, these should be easy (to do, and find out how to do)...


On Sat, Apr 4, 2020 at 10:51 PM Martin Maechler
<maechler using stat.math.ethz.ch> wrote:
>
> This is mostly a RFC  [but *not* about the many extra packages, please..]:
>
> Noticing to my chagrin  how my students work in a project,
> googling for R code and cut'n'pasting stuff together, accumulating
> this and that package on the way  all just for simple daily time series
> (though with partly missing parts),
> using chron, zoo, lubridate, ...  all for things that are very
> easy in base R *IF* you read help pages and start thinking on
> your own (...), I've noted once more that the above "if" is a
> very strong one, and seems to happen rarely nowadays by typical R users...
> (yes, I stop whining for now).
>
> In this case, I propose to slightly improve the situation ...
> by adding a few more lines to one help page [[how could that
> help in the age where "google"+"cut'n'paste" has replaced thinking ? .. ]] :
>
> On R's own ?Dates  help page (and also on ?DateTimeClasses )
> we have pointers, notably
>
> See Also:
>
>      ...............
>      ...............
>
>      'weekdays' for convenience extraction functions.
>
> So people must find that and follow the pointer
> (instead of installing one of the dozen helper packages).
>
> Then on that page, one sees  weekdays(), months() .. julian()
> in the usage ... which don't seem directly helpful for a person
> who needs more.  If that person is diligent and patient (as good useRs are ;-),
> she finds
>
>    Note:
>
>         Other components such as the day of the month or the year are very
>         easy to compute: just use 'as.POSIXlt' and extract the relevant
>         component.  Alternatively (especially if the components are
>         desired as character strings), use 'strftime'.
>
>
> But then, nowadays, the POSIXlt class is not so transparent to the
> non-expert anymore (as it behaves very much like POSIXct, and
> not like a list for good reasons) .. and so 97%  of R users will
> not find this "very easy".
>
> For this reason, I propose to at add the following to the
> 'Examples:' section of the help file ...
> and I hope that also readers of  R-devel  who have not been
> aware of how to do this nicely,  will now remember (or remember
> where to look?).
>
> I at least will tell my students in the future to use these or
> write versions of these simple utility functions.
>
>
> ------------------------------------------------
>
> ## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
> ## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)
>
> ##' Transform (Time-)Date vector  to  convenient data frame :
> dt2df <- function(dt, dName = deparse(substitute(dt)), stringsAsFactors = FALSE) {
>     DF <- as.data.frame(unclass(as.POSIXlt( dt )), stringsAsFactors=stringsAsFactors)
>     `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
> }
> dt2df(.leap.seconds)    # date+time
> dt2df(Sys.Date() + 0:9) # date
>
> ##' Even simpler:  Date -> Matrix:
> d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x)))
> d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date
>
> ------------------------------------------------------------
>
> In the distant past / one of the last times I touched on people
> using (base) R's  Date / Time-Date  objects, I had started
> thinking if we should not provide some simple utilities to "base R"
> (not in the 'base' pkg, but rather 'utils') for "extracting" from
> {POSIX(ct), Date} objects ... and we may have discussed that
> within R Core 20 years ago,  and had always thought that this
> shouldn't be hard for useRs themselves to see how to do...
>
> But then I see that "everybody" uses extension packages instead,
> even in the many situations where there's no gain doing so,
> but rather increases the dependency-complexity of the data analysis
> unnecessarily.
>
> Martin Maechler
> ETH Zurich  and   R Core Team.
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list