[Rd] Question about regexp edge case
    Duncan Murdoch 
    murdoch@dunc@n @end|ng |rom gm@||@com
       
    Mon Jul 29 02:02:21 CEST 2024
    
    
  
On StackOverflow (here: 
https://stackoverflow.com/questions/78803652/why-does-gsub-in-r-match-one-character-too-many) 
there was a question about this result:
 > gsub("^([0-9]{,5}).*","\\1","123456789")
[1] "123456"
The OP expected "12345" as the result.  Several points were raised:
  - The R docs don't mention the case of {,5} for the default perl = 
FALSE which uses TRE.
  - perl = TRUE gives the OP's expected result of "12345".
  - perl = TRUE does *not* give the documented result on at least one 
system (which is "123456789", because "{,5}" is documented to not be a 
quantifier, so it should only match the literal string "{,5}").
  - Some regexp engines (including Perl and Awk) document that "12345" 
is correct.
Is any of this worth fixing?
Duncan Murdoch
    
    
More information about the R-devel
mailing list