[Rd] An interesting chat with ChatGPT
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Mon Feb 13 14:37:22 CET 2023
I was wondering which anchor would match in the regular expression
"(a)|(b)" if both "a" and "b" matches the target string. What appears
to happen is that the first match wins, and that's how it is documented
in Python, but I was looking for docs on this in R.
As far as I can see, the ?regex help page doesn't resolve this. So just
for fun, I decided to ask ChatGPT, and got an interesting response. I
won't show the whole conversation here, but just some extracts:
---------------------------------------
Me: In a regular expression in R, if both "a" and "b" match, which does
"(a)|(b)" match? Please include authoritative references in your answer.
ChatGPT: [long answer about grep]
Me: In grep(), it doesn't matter which of a or b matches. It's only in
sub() or gsub() that it makes a difference. Do you know an
authoritative reference that discusses those functions?
ChatGPT: [long irrelevant answer about sub and gsub, with a reference
to "R Cookbook" by P. Teetor.]
Me: Teetor's book is not authoritative, and the other reference doesn't
discuss ambiguous alternation.
... finally we get to the interesting part ...
ChatGPT: I apologize for the mistake in my previous answer.
A more authoritative reference for the behavior of regular expressions
in R can be found in the official R documentation.
Regarding the behavior of ambiguous alternation, the regex section of
the manual "An Introduction to R" explains it as follows:
"When several alternatives are possible, the first alternative is
chosen, so it is important to specify alternatives in the order in which
they should be tried."
Here is the reference for the official R documentation:
"An Introduction to R" manual:
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Regular-expression-facilities
I hope this reference provides the information you were looking for.
---------------------------------------
All right, an authoritative answer! The only problem is that *it
doesn't exist*. There's no such anchor in that document. R-intro isn't
a very rapidly changing document, so I could look at every change
recorded in svn, and that section has *never* existed.
I guess ChatGPT is a pattern matching machine, and if such a question
had an answer, it would probably be in a location like that. (I would
have guessed a help page like ?regex instead of R-intro, but I had just
told it that answer was wrong.)
Duncan Murdoch
More information about the R-devel
mailing list