[R] regexec: Unexpected answer when matching digits

Duncan Murdoch murdoch.duncan at gmail.com
Sun May 4 23:51:03 CEST 2014


On 04/05/2014, 5:03 PM, Stephen Sentoff wrote:
> I was trying to use regexec to extract number fields from my data and got an unexpected response.  I can reproduce the issue with this small test case.
>
> regexec("\\d{2,}", "abcd123")
>
> I get a match at position 1, for length 7.  Not what I expected.
>
> I do get the expected response (match at position 5, for length 3) when I do any of the following:
>
> regexec("[0-9]{2,}", "abcd123")
> regexec("\\d{1,}", "abcd123")
> regexec("\\d+", "abcd123")
>
> I have also verified that regexpr handles this pattern as I expect.
>
> And to add further confusion, this only seems to happen on my Linux machine, not on Windows.
>
> This seems to be an incredibly specific condition.  Anybody know what's going on?
>

It looks like a bug.  I see it in R 3.0.3 on Mac OS, but not in 
3.1.0-patched on Windows.  What version are you using, on what OS?

Duncan Murdoch



More information about the R-help mailing list