[R] Weird and changed as.roman() behavior
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Wed Jan 15 11:41:34 CET 2025
>>>>> Jani Välimaa
>>>>> on Tue, 14 Jan 2025 20:39:19 +0200 writes:
> Hello,
> I don't know what's changed or how to figure out why as.roman() started
> to work different way lately on Mageia Cauldron. Cauldron is the
> latest development version of Mageia Linux.
> Expected bahavior:
>> as.roman(strrep("I", 1:5))
> [1] I II III IV V
> Current behavior:
>> as.roman(strrep("I", 1:5))
> [1] I II III IV <NA>
> Warning message:
> In .roman2numeric(x) : invalid roman numeral: IIIII
> as.roman() doesn't handle "IIIII" -> "V" anymore and thus 'make check'
> fails when building any 4.3.x or 4.4.x versions from the sources.
Not yet.
For me, (on Linux Fedora 40),
on current R-4.4.2, R-patched and R-devel I get the same good
results from
(cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
> (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
[1] "I" "II" "III" "IIII" "IIIII"
[1] I II III IV V
structure(1:5, class = "roman")
>
The code behind this uses grep() and grepl()
and I assume this somehow does not work correctly on your
platform?
Digging a bit further, the crucial part in this case happens in
the (namespace hidden) function utils ::: .roman2numeric
which you probably already know from the above warning.
For me,
(cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc)); dput(r2)
gives
> (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc))
[1] "I" "II" "III" "IIII" "IIIII"
[1] 1 2 3 4 5
>
this must be different in your case.
You can use
debug(utils:::.roman2numeric)
and
utils:::.roman2numeric(cc)
to find out where the problem happens.
This will show almost surely that the problem is indeed in a
grepl() call.
I'm close to sure it is this:
> grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
[1] TRUE TRUE TRUE TRUE TRUE
where you don't get the same, but probably
[1] TRUE TRUE TRUE TRUE FALSE
which I *do* get, too if I use grepl(....., perl=TRUE)
.. see also below.
The code we use is our own tweaked version of 'TRE' (in <Rsrc>/extra/tre/ ),
and I do think we've occasionally seen platform dependencies.
Also, yes, in 2022 there have been several changes, related to
fixing bugs, though several ones *before* releasing R 4.3.0.
Last, but not (at all!) least:
Actually, I *am* confused a bit why this ever worked (and still
works for most of us):
I'm using {,2} instead of {,4} to make things faster to grasp;
I see
> grepl("^I{,2}$", c("II", "III", "IIII"))
[1] TRUE TRUE FALSE
>
and I wonder why 'I{,2}' matches 3 "I"s. ... I'd thought {,2} to
mean " up to 2 occurrences (of the previous <entity>)"
(where here <entity> = character).
In our real example, I{,4} matched 5 "I"s
and as I mentioned above, the somewhat more maintained
perl=TRUE option does *not*.
We could change the code to use I{,5} to make 5x"I", i.e. "IIIII"
work for you .. but then that would also match
"IIIIII" (6 x "I") for "everybody" else with our current TRE engine..
More information about the R-help
mailing list