[R] Weird and changed as.roman() behavior
Jani Välimaa
w@||y @end|ng |rom m@ge|@@org
Thu Jan 16 16:34:42 CET 2025
On Wed, 15 Jan 2025 11:41:34 +0100
Martin Maechler wrote:
> >>>>> Jani V?limaa
> >>>>> on Tue, 14 Jan 2025 20:39:19 +0200 writes:
>
> > Hello,
> > I don't know what's changed or how to figure out why as.roman() started
> > to work different way lately on Mageia Cauldron. Cauldron is the
> > latest development version of Mageia Linux.
>
> > Expected bahavior:
> >> as.roman(strrep("I", 1:5))
> > [1] I II III IV V
>
> > Current behavior:
> >> as.roman(strrep("I", 1:5))
> > [1] I II III IV <NA>
> > Warning message:
> > In .roman2numeric(x) : invalid roman numeral: IIIII
>
> > as.roman() doesn't handle "IIIII" -> "V" anymore and thus 'make check'
> > fails when building any 4.3.x or 4.4.x versions from the sources.
>
> Not yet.
> For me, (on Linux Fedora 40),
> on current R-4.4.2, R-patched and R-devel I get the same good
> results from
>
> (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
>
> > (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
> [1] "I" "II" "III" "IIII" "IIIII"
> [1] I II III IV V
> structure(1:5, class = "roman")
> >
>
> The code behind this uses grep() and grepl()
> and I assume this somehow does not work correctly on your
> platform?
>
> Digging a bit further, the crucial part in this case happens in
> the (namespace hidden) function utils ::: .roman2numeric
> which you probably already know from the above warning.
> For me,
>
> (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc)); dput(r2)
>
> gives
>
> > (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc))
> [1] "I" "II" "III" "IIII" "IIIII"
> [1] 1 2 3 4 5
> >
>
> this must be different in your case.
>
> You can use
> debug(utils:::.roman2numeric)
> and
> utils:::.roman2numeric(cc)
>
> to find out where the problem happens.
> This will show almost surely that the problem is indeed in a
> grepl() call.
>
> I'm close to sure it is this:
>
> > grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
> [1] TRUE TRUE TRUE TRUE TRUE
>
> where you don't get the same, but probably
>
> [1] TRUE TRUE TRUE TRUE FALSE
>
> which I *do* get, too if I use grepl(....., perl=TRUE)
> .. see also below.
>
>
> The code we use is our own tweaked version of 'TRE' (in <Rsrc>/extra/tre/ ),
> and I do think we've occasionally seen platform dependencies.
>
> Also, yes, in 2022 there have been several changes, related to
> fixing bugs, though several ones *before* releasing R 4.3.0.
>
> Last, but not (at all!) least:
>
> Actually, I *am* confused a bit why this ever worked (and still
> works for most of us):
>
> I'm using {,2} instead of {,4} to make things faster to grasp;
> I see
>
> > grepl("^I{,2}$", c("II", "III", "IIII"))
> [1] TRUE TRUE FALSE
> >
>
> and I wonder why 'I{,2}' matches 3 "I"s. ... I'd thought {,2} to
> mean " up to 2 occurrences (of the previous <entity>)"
> (where here <entity> = character).
>
> In our real example, I{,4} matched 5 "I"s
>
> and as I mentioned above, the somewhat more maintained
> perl=TRUE option does *not*.
>
> We could change the code to use I{,5} to make 5x"I", i.e. "IIIII"
> work for you .. but then that would also match
> "IIIIII" (6 x "I") for "everybody" else with our current TRE engine..
>
Thanks for your insights.
Mageia uses system TRE with R via --with-system-tre configure option.
TRE was updated some time ago to version 0.9.0, and looks like the
'issue' started at the same time.
And indeed as.roman() works as before after I rebuilt R with bundled
TRE 0.8.0 using --with-system-tre=no.
So, something changed in TRE 0.9.0 and grepl().
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP-allekirjoitus
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20250116/683b81f1/attachment.sig>
More information about the R-help
mailing list