[R] regexec: Unexpected answer when matching digits

Stephen Sentoff shsentoff at comcast.net
Mon May 5 01:26:58 CEST 2014


Here is my sessionInfo from the linux machine where I see this behavior.

R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C LC_TIME=en_US.UTF-8
  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8 
LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 
LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

loaded via a namespace (and not attached):
[1] tools_3.1.0

-- 
Steve Sentoff

On 5/4/2014 4:51 PM, Duncan Murdoch wrote:
> On 04/05/2014, 5:03 PM, Stephen Sentoff wrote:
>> I was trying to use regexec to extract number fields from my data and 
>> got an unexpected response.  I can reproduce the issue with this 
>> small test case.
>>
>> regexec("\\d{2,}", "abcd123")
>>
>> I get a match at position 1, for length 7.  Not what I expected.
>>
>> I do get the expected response (match at position 5, for length 3) 
>> when I do any of the following:
>>
>> regexec("[0-9]{2,}", "abcd123")
>> regexec("\\d{1,}", "abcd123")
>> regexec("\\d+", "abcd123")
>>
>> I have also verified that regexpr handles this pattern as I expect.
>>
>> And to add further confusion, this only seems to happen on my Linux 
>> machine, not on Windows.
>>
>> This seems to be an incredibly specific condition.  Anybody know 
>> what's going on?
>>
>
> It looks like a bug.  I see it in R 3.0.3 on Mac OS, but not in 
> 3.1.0-patched on Windows.  What version are you using, on what OS?
>
> Duncan Murdoch
>
>
>



More information about the R-help mailing list