[R] regexec: Unexpected answer when matching digits
Stephen Sentoff
shsentoff at comcast.net
Mon May 5 01:26:58 CEST 2014
Here is my sessionInfo from the linux machine where I see this behavior.
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.1.0
--
Steve Sentoff
On 5/4/2014 4:51 PM, Duncan Murdoch wrote:
> On 04/05/2014, 5:03 PM, Stephen Sentoff wrote:
>> I was trying to use regexec to extract number fields from my data and
>> got an unexpected response. I can reproduce the issue with this
>> small test case.
>>
>> regexec("\\d{2,}", "abcd123")
>>
>> I get a match at position 1, for length 7. Not what I expected.
>>
>> I do get the expected response (match at position 5, for length 3)
>> when I do any of the following:
>>
>> regexec("[0-9]{2,}", "abcd123")
>> regexec("\\d{1,}", "abcd123")
>> regexec("\\d+", "abcd123")
>>
>> I have also verified that regexpr handles this pattern as I expect.
>>
>> And to add further confusion, this only seems to happen on my Linux
>> machine, not on Windows.
>>
>> This seems to be an incredibly specific condition. Anybody know
>> what's going on?
>>
>
> It looks like a bug. I see it in R 3.0.3 on Mac OS, but not in
> 3.1.0-patched on Windows. What version are you using, on what OS?
>
> Duncan Murdoch
>
>
>
More information about the R-help
mailing list