[Rd] New strptime conversion specification for ordinal suffixes

Michael Chirico michaelchirico4 at gmail.com
Wed Aug 31 16:10:47 CEST 2016


As touched on briefly on SO <http://stackoverflow.com/questions/39237299>,
base R has what appears to me to be a serious deficiency in its inability
to recognize dates formatted as character strings with ordinal suffixes:

ord_dates <- c("September 1st, 2016", "September 2nd, 2016",
               "September 3rd, 2016", "September 4th, 2016")

?strptime lists no conversion specification which could match ord_dates in
one pass (as I discovered, even lubridate only manages to succeed by going
through the vector in several passes).

How difficult would it be to add a new conversion specification which would
handle this, which would seem to me to be a pretty common instance of dates
to be found in the raw data wild?

My suggestion would be %o for ordinal suffixes. These would obviously be
locale-specific, but in English, %o would match to:


   - st
   - nd
   - rd
   - th
   - st
   - nd
   - rd
   - th


Other languages may be covered by this
<https://en.wikipedia.org/wiki/Ordinal_indicator> and/or this
<https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts>
Wikipedia page on ordinal superscripts & Unicode superscripts, respectively.

With this implemented, converting ord_dates to a Date or POSIXct would be
as simple as:

as.Date(ord_dates, format = "%B %d%o, %Y")

Is there something on the C level preventing this from happening?

Michael Chirico

	[[alternative HTML version deleted]]



More information about the R-devel mailing list