[R] Weird and changed as.roman() behavior

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Jan 16 12:04:44 CET 2025


>>>>> Stephanie Evert 
>>>>>     on Wed, 15 Jan 2025 13:18:03 +0100 writes:

    > Well, the real issue then seems to be that .roman2numeric uses an invalid regular expression:
    >>> grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
    >> [1] TRUE TRUE TRUE TRUE TRUE

    > or 

    >>> grepl("^I{,2}$", c("II", "III", "IIII"))
    >> [1]  TRUE  TRUE FALSE


    > Both the TRE and the PCRE specification only allow repetition quantifiers of the form

    > {a}
    > {a,b}
    > {a,}

    > https://laurikari.net/tre/documentation/regex-syntax/
    > https://www.pcre.org/original/doc/html/pcrepattern.html#SEC17

    > {,2} and {,4} are thus invalid and seem to result in undefined behaviour (which PCRE and TRE fill in different ways, but consistently not what was intended). 

    >> > grepl("^I{,2}$", c("II", "III", "IIII"))
    >> [1]  TRUE  TRUE FALSE

    >> > grepl("^I{,2}$", c("II", "III", "IIII"), perl=TRUE)
    >> [1] FALSE FALSE FALSE

    > Fix thus is easy: {,4} => {0,4}

    > Best,
    > Stephanie

Thanks a lot, Stephanie -- indeed, I think I would not have searched in
this direction at all
( To me it seemed "obvious" that if {3,} is well defined,  {,3}
  would be so, too...  But I was *wrong* and actually I also
  understand and that {,3} is not needed, and {0,3} is clearer,
  whereas {3,} is not easy to re-express ( '{0,inf}' or similar
  would make the code considerably more complicated and probably slower..)

Actually, to remain back compatible (see Jani's original report:
he'd like "IIIII" to work, as it did for many/most of us),
we should replace  {,4}  by {0,5}.

But there's more:  our current help page
    https://search.r-project.org/R/refmans/utils/html/roman.html
says

> Only numbers between 1 and 3999 have a unique representation
> as roman numbers, and hence others result in as.roman(NA). 
 
which is really not quite true, in more than one sense:

1.  as.roman(3899:3999)   # works fine

not producing any NA

2. I think, e.g.,  "MMMM"
is a pretty unique representation of 4000.

Also, one piece of other software (online)
    https://www.rapidtables.com/convert/number/date-to-roman-numerals.html

does convert _dates_ up to the year 4999, see,
  https://www.rapidtables.com/convert/number/date-to-roman-numerals.html?msel=January&dsel=1&year=4999&fmtsel=MM.DD.YYYY

giving  MMMMCMXCIX  for 4999.

Hence, I also think we should enlarge the valid range from current
{1 .. 3999}  to 
{1 .. 4999}

Martin



More information about the R-help mailing list