[R] Weird and changed as.roman() behavior

Jani Välimaa w@||y @end|ng |rom m@ge|@@org
Thu Jan 16 16:34:42 CET 2025


On Wed, 15 Jan 2025 11:41:34 +0100
Martin Maechler wrote:

> >>>>> Jani V?limaa 
> >>>>>     on Tue, 14 Jan 2025 20:39:19 +0200 writes:  
> 
>     > Hello,
>     > I don't know what's changed or how to figure out why as.roman() started
>     > to work different way lately on Mageia Cauldron. Cauldron is the
>     > latest development version of Mageia Linux.  
> 
>     > Expected bahavior:  
>     >> as.roman(strrep("I", 1:5))  
>     > [1] I   II  III IV  V    
> 
>     > Current behavior:  
>     >> as.roman(strrep("I", 1:5))  
>     > [1] I    II   III  IV   <NA>
>     > Warning message:
>     > In .roman2numeric(x) : invalid roman numeral: IIIII  
> 
>     > as.roman() doesn't handle "IIIII" -> "V" anymore and thus 'make check'
>     > fails when building any 4.3.x or 4.4.x versions from the sources.  
> 
> Not yet.
> For me, (on Linux Fedora 40),
> on current R-4.4.2,  R-patched and R-devel  I get the same good
> results from
> 
>  (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
>  
>   > (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)  
>   [1] "I"     "II"    "III"   "IIII"  "IIIII"
>   [1] I   II  III IV  V  
>   structure(1:5, class = "roman")
>   >  
> 
> The code behind this uses grep() and grepl()
> and I assume this somehow does not work correctly on your
> platform?
> 
> Digging a bit further, the crucial part in this case happens in
> the (namespace hidden) function   utils ::: .roman2numeric
> which you probably already know from the above warning.
> For me,
> 
>  (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc)); dput(r2)
> 
> gives
> 
>   > (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc))  
>   [1] "I"     "II"    "III"   "IIII"  "IIIII"
>   [1] 1 2 3 4 5
>   >  
> 
> this must be different in your case.
> 
> You can use
> 	debug(utils:::.roman2numeric)
> and
> 	utils:::.roman2numeric(cc)
> 
> to find out where the problem happens.
> This will show almost surely that the problem is indeed in a
> grepl() call.
> 
> I'm close to sure it is this:
> 
> > grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)  
> [1] TRUE TRUE TRUE TRUE TRUE
> 
> where you don't get the same, but probably
> 
>   [1] TRUE TRUE TRUE TRUE FALSE
> 
> which I *do* get, too if I use  grepl(....., perl=TRUE)
> .. see also below.
> 
> 
> The code we use is our own tweaked version of 'TRE' (in <Rsrc>/extra/tre/ ),
> and I do think we've occasionally seen platform dependencies.
> 
> Also, yes, in 2022 there have been several changes, related to
> fixing bugs, though several ones *before* releasing R 4.3.0.
> 
> Last, but not (at all!) least:
> 
> Actually, I *am* confused a bit why this ever worked (and still
> works for most of us):
> 
> I'm using {,2} instead of {,4}  to make things faster to grasp;
> I see
> 
>   > grepl("^I{,2}$", c("II", "III", "IIII"))  
>   [1]  TRUE  TRUE FALSE
>   >  
> 
> and I wonder why 'I{,2}' matches 3 "I"s. ... I'd thought {,2} to
> mean " up to 2 occurrences (of the previous <entity>)"
> (where here <entity> = character).
> 
> In our real example,  I{,4} matched 5 "I"s
> 
> and as I mentioned above, the somewhat more maintained
> perl=TRUE option does *not*.
> 
> We could change the code to use  I{,5}  to make 5x"I", i.e. "IIIII" 
> work for you .. but then that would also match
> "IIIIII" (6 x "I") for "everybody" else with our current TRE engine..
> 

Thanks for your insights.

Mageia uses system TRE with R via --with-system-tre configure option.
TRE was updated some time ago to version 0.9.0, and looks like the
'issue' started at the same time.

And indeed as.roman() works as before after I rebuilt R with bundled
TRE 0.8.0 using --with-system-tre=no.

So, something changed in TRE 0.9.0 and grepl().

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP-allekirjoitus
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20250116/683b81f1/attachment.sig>


More information about the R-help mailing list