[R] Weird and changed as.roman() behavior
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Jan 16 12:04:44 CET 2025
>>>>> Stephanie Evert
>>>>> on Wed, 15 Jan 2025 13:18:03 +0100 writes:
> Well, the real issue then seems to be that .roman2numeric uses an invalid regular expression:
>>> grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
>> [1] TRUE TRUE TRUE TRUE TRUE
> or
>>> grepl("^I{,2}$", c("II", "III", "IIII"))
>> [1] TRUE TRUE FALSE
> Both the TRE and the PCRE specification only allow repetition quantifiers of the form
> {a}
> {a,b}
> {a,}
> https://laurikari.net/tre/documentation/regex-syntax/
> https://www.pcre.org/original/doc/html/pcrepattern.html#SEC17
> {,2} and {,4} are thus invalid and seem to result in undefined behaviour (which PCRE and TRE fill in different ways, but consistently not what was intended).
>> > grepl("^I{,2}$", c("II", "III", "IIII"))
>> [1] TRUE TRUE FALSE
>> > grepl("^I{,2}$", c("II", "III", "IIII"), perl=TRUE)
>> [1] FALSE FALSE FALSE
> Fix thus is easy: {,4} => {0,4}
> Best,
> Stephanie
Thanks a lot, Stephanie -- indeed, I think I would not have searched in
this direction at all
( To me it seemed "obvious" that if {3,} is well defined, {,3}
would be so, too... But I was *wrong* and actually I also
understand and that {,3} is not needed, and {0,3} is clearer,
whereas {3,} is not easy to re-express ( '{0,inf}' or similar
would make the code considerably more complicated and probably slower..)
Actually, to remain back compatible (see Jani's original report:
he'd like "IIIII" to work, as it did for many/most of us),
we should replace {,4} by {0,5}.
But there's more: our current help page
https://search.r-project.org/R/refmans/utils/html/roman.html
says
> Only numbers between 1 and 3999 have a unique representation
> as roman numbers, and hence others result in as.roman(NA).
which is really not quite true, in more than one sense:
1. as.roman(3899:3999) # works fine
not producing any NA
2. I think, e.g., "MMMM"
is a pretty unique representation of 4000.
Also, one piece of other software (online)
https://www.rapidtables.com/convert/number/date-to-roman-numerals.html
does convert _dates_ up to the year 4999, see,
https://www.rapidtables.com/convert/number/date-to-roman-numerals.html?msel=January&dsel=1&year=4999&fmtsel=MM.DD.YYYY
giving MMMMCMXCIX for 4999.
Hence, I also think we should enlarge the valid range from current
{1 .. 3999} to
{1 .. 4999}
Martin
More information about the R-help
mailing list