[Rd] Question about regexp edge case
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Mon Jul 29 02:02:21 CEST 2024
On StackOverflow (here:
https://stackoverflow.com/questions/78803652/why-does-gsub-in-r-match-one-character-too-many)
there was a question about this result:
> gsub("^([0-9]{,5}).*","\\1","123456789")
[1] "123456"
The OP expected "12345" as the result. Several points were raised:
- The R docs don't mention the case of {,5} for the default perl =
FALSE which uses TRE.
- perl = TRUE gives the OP's expected result of "12345".
- perl = TRUE does *not* give the documented result on at least one
system (which is "123456789", because "{,5}" is documented to not be a
quantifier, so it should only match the literal string "{,5}").
- Some regexp engines (including Perl and Awk) document that "12345"
is correct.
Is any of this worth fixing?
Duncan Murdoch
More information about the R-devel
mailing list