[R] regexec: Unexpected answer when matching digits

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon May 5 07:04:00 CEST 2014


On 05/05/2014 00:26, Stephen Sentoff wrote:
> Here is my sessionInfo from the linux machine where I see this behavior.
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C LC_TIME=en_US.UTF-8
>   [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
> LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C LC_ADDRESS=C
> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods base
>
> loaded via a namespace (and not attached):
> [1] tools_3.1.0

It is a known bug in the TRE engine, PR14408.  It needs a UTF-8 locale 
and a range of repeat modifiers (here {2,}), and can be worked around by 
using perl=TRUE where supported (I do not know why the author of regexec 
did not support it).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list