[R] Weird and changed as.roman() behavior

Stephanie Evert @te|@nML @end|ng |rom co||oc@t|on@@de
Wed Jan 15 13:18:03 CET 2025


Well, the real issue then seems to be that .roman2numeric uses an invalid regular expression:

>> grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
> [1] TRUE TRUE TRUE TRUE TRUE

or 

>> grepl("^I{,2}$", c("II", "III", "IIII"))
>  [1]  TRUE  TRUE FALSE


Both the TRE and the PCRE specification only allow repetition quantifiers of the form

	{a}
	{a,b}
	{a,}

https://laurikari.net/tre/documentation/regex-syntax/
https://www.pcre.org/original/doc/html/pcrepattern.html#SEC17

{,2} and {,4} are thus invalid and seem to result in undefined behaviour (which PCRE and TRE fill in different ways, but consistently not what was intended). 

> > grepl("^I{,2}$", c("II", "III", "IIII"))
> [1]  TRUE  TRUE FALSE

> > grepl("^I{,2}$", c("II", "III", "IIII"), perl=TRUE)
> [1] FALSE FALSE FALSE

Fix thus is easy: {,4} => {0,4}

Best,
Stephanie


More information about the R-help mailing list