[R] Weird and changed as.roman() behavior

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Wed Jan 15 11:41:34 CET 2025


>>>>> Jani Välimaa 
>>>>>     on Tue, 14 Jan 2025 20:39:19 +0200 writes:

    > Hello,
    > I don't know what's changed or how to figure out why as.roman() started
    > to work different way lately on Mageia Cauldron. Cauldron is the
    > latest development version of Mageia Linux.

    > Expected bahavior:
    >> as.roman(strrep("I", 1:5))
    > [1] I   II  III IV  V  

    > Current behavior:
    >> as.roman(strrep("I", 1:5))
    > [1] I    II   III  IV   <NA>
    > Warning message:
    > In .roman2numeric(x) : invalid roman numeral: IIIII

    > as.roman() doesn't handle "IIIII" -> "V" anymore and thus 'make check'
    > fails when building any 4.3.x or 4.4.x versions from the sources.

Not yet.
For me, (on Linux Fedora 40),
on current R-4.4.2,  R-patched and R-devel  I get the same good
results from

 (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
 
  > (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
  [1] "I"     "II"    "III"   "IIII"  "IIIII"
  [1] I   II  III IV  V  
  structure(1:5, class = "roman")
  >

The code behind this uses grep() and grepl()
and I assume this somehow does not work correctly on your
platform?

Digging a bit further, the crucial part in this case happens in
the (namespace hidden) function   utils ::: .roman2numeric
which you probably already know from the above warning.
For me,

 (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc)); dput(r2)

gives

  > (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc))
  [1] "I"     "II"    "III"   "IIII"  "IIIII"
  [1] 1 2 3 4 5
  >

this must be different in your case.

You can use
	debug(utils:::.roman2numeric)
and
	utils:::.roman2numeric(cc)

to find out where the problem happens.
This will show almost surely that the problem is indeed in a
grepl() call.

I'm close to sure it is this:

> grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
[1] TRUE TRUE TRUE TRUE TRUE

where you don't get the same, but probably

  [1] TRUE TRUE TRUE TRUE FALSE

which I *do* get, too if I use  grepl(....., perl=TRUE)
.. see also below.


The code we use is our own tweaked version of 'TRE' (in <Rsrc>/extra/tre/ ),
and I do think we've occasionally seen platform dependencies.

Also, yes, in 2022 there have been several changes, related to
fixing bugs, though several ones *before* releasing R 4.3.0.

Last, but not (at all!) least:

Actually, I *am* confused a bit why this ever worked (and still
works for most of us):

I'm using {,2} instead of {,4}  to make things faster to grasp;
I see

  > grepl("^I{,2}$", c("II", "III", "IIII"))
  [1]  TRUE  TRUE FALSE
  >

and I wonder why 'I{,2}' matches 3 "I"s. ... I'd thought {,2} to
mean " up to 2 occurrences (of the previous <entity>)"
(where here <entity> = character).

In our real example,  I{,4} matched 5 "I"s

and as I mentioned above, the somewhat more maintained
perl=TRUE option does *not*.

We could change the code to use  I{,5}  to make 5x"I", i.e. "IIIII" 
work for you .. but then that would also match
"IIIIII" (6 x "I") for "everybody" else with our current TRE engine..



More information about the R-help mailing list