[R] Regex engine types
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sat Jun 10 08:47:07 CEST 2006
?regex does describe this:
A range of characters may be specified by giving the first and last
characters, separated by a hyphen. (Character ranges are
interpreted in the collation order of the current locale.)
You did not tell us your locale, but based on questions from you in the
past I would guess en_NZ.utf8. In that locale the collation order is
wWxXyYzZ, so your surprise is explained. (It seems the PCRE code is not
using the same ordering in that locale.)
You may find it useful to set LC_COLLATE to C as I do:
> strsplit(Sys.getlocale(), ";")
[[1]]
[1] "LC_CTYPE=en_GB" "LC_NUMERIC=C" "LC_TIME=en_GB"
[4] "LC_COLLATE=C" "LC_MONETARY=en_GB" "LC_MESSAGES=en_GB"
[7] "LC_PAPER=en_GB" "LC_NAME=C" "LC_ADDRESS=C"
[10] "LC_TELEPHONE=C" "LC_MEASUREMENT=en_GB" "LC_IDENTIFICATION=C"
On Sat, 10 Jun 2006, Patrick Connolly wrote:
>> version
> _
> platform x86_64-unknown-linux-gnu
> arch x86_64
> os linux-gnu
> system x86_64, linux-gnu
> status
> major 2
> minor 2.1
> year 2005
> month 12
> day 20
> svn rev 36812
> language R
>>
>
>> grep("[W-Z]", LETTERS, value = TRUE)
> [1] "W" "X" "Y" "Z"
>
> That's what I'd have expected.
>
>> grep("[W-Z]", letters, value = TRUE)
> [1] "x" "y" "z"
>
> Not what I'd have thought. However,
>
>> grep("[B-D]", letters, value = TRUE, perl = TRUE)
> character(0)
>
> So what is it that standard regular expressions use that's different
> from Perl-type ones?
>
> The help file for grep refers to POSIX 1003.2 which looked a bit
> daunting to delve into. From my limited reading, it seems there are
> different gegex "Engine Types" which seems to be getting somewhat
> tangential to what I was working on. I could probably avoid problems
> if I always set perl=TRUE, but it would be good to know what basic and
> extended regular expressions do that's different. If someone has a
> quick line or two describing it, I'd be interested to know.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list