[Rd] An interesting chat with ChatGPT

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Mon Feb 13 14:37:22 CET 2023


I was wondering which anchor would match in the regular expression 
"(a)|(b)" if both "a" and "b" matches the target string.  What appears 
to happen is that the first match wins, and that's how it is documented 
in Python, but I was looking for docs on this in R.

As far as I can see, the ?regex help page doesn't resolve this. So just 
for fun, I decided to ask ChatGPT, and got an interesting response.  I 
won't show the whole conversation here, but just some extracts:

---------------------------------------
Me:  In a regular expression in R, if both "a" and "b" match, which does 
"(a)|(b)" match?  Please include authoritative references in your answer.

ChatGPT: [long answer about grep]

Me:  In grep(), it doesn't matter which of a or b matches.  It's only in 
sub() or gsub() that it makes a difference.  Do you know an 
authoritative reference that discusses those functions?

ChatGPT:  [long irrelevant answer about sub and gsub, with a reference 
to "R Cookbook" by P. Teetor.]

Me:  Teetor's book is not authoritative, and the other reference doesn't 
discuss ambiguous alternation.

... finally we get to the interesting part ...

ChatGPT: I apologize for the mistake in my previous answer.

A more authoritative reference for the behavior of regular expressions 
in R can be found in the official R documentation.

Regarding the behavior of ambiguous alternation, the regex section of 
the manual "An Introduction to R" explains it as follows:

"When several alternatives are possible, the first alternative is 
chosen, so it is important to specify alternatives in the order in which 
they should be tried."

Here is the reference for the official R documentation:

     "An Introduction to R" manual: 
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Regular-expression-facilities

I hope this reference provides the information you were looking for.
---------------------------------------


All right, an authoritative answer!  The only problem is that *it 
doesn't exist*.  There's no such anchor in that document.  R-intro isn't 
a very rapidly changing document, so I could look at every change 
recorded in svn, and that section has *never* existed.

I guess ChatGPT is a pattern matching machine, and if such a question 
had an answer, it would probably be in a location like that.  (I would 
have guessed a help page like ?regex instead of R-intro, but I had just 
told it that answer was wrong.)

Duncan Murdoch



More information about the R-devel mailing list