[Rd] invalid regular expression '[a-Z]'
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Mar 6 08:09:27 CET 2008
On Wed, 5 Mar 2008, Henrik Bengtsson wrote:
> On Wed, Mar 5, 2008 at 6:18 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>> On 05/03/2008 8:56 PM, Henrik Bengtsson wrote:
>> > Hi,
>> >
>> > just curious, but does anyone know the source/reason of observing the
>> > following error on OSX but not on WinXP and Linux?
>>
>> Presumably in the locale you're using on OSX, "a" < "Z" is false. This
>> is the ascii sort order used in the C locale. On my Windows box, "a" <
>> "Z" is true, because it uses the English_Canada.1252 collation order.
>
> That's it indeed. The person who first reported the error had
> sessionInfo() locale
> 'en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8' and I
> missed that 'C' in the middle, which I guess his system falls back to
> if none of the previous ones exist?!?
No. Those are settings for various categories, just as you showed for
Window. The first setting appears to be LC_COLLATE, but what they mean is
not documented on the system man page for setlocale.
It's just that MacOS uses C collation order in English locales, even
though almost everyone else uses aAbB or AaBb (the latter being what the
English actually use, as do almost all book indices in dialects of
English). But then there is no surprise that MacOS has to be different
... its implementaton of locales is idiosyncratic (to be generous).
Note that even [A-Za-z] is unsafe -- as I recall Z is in the middle of the
alphabet in Estonian locales. If you want alphabetic characters, use
[[:alpha:]]. If you want ASCII alphabetic characters, write out the
ranges as [AB...Zab...z]
E.g. (F8 Linux)
> Sys.setlocale("LC_COLLATE", "et_EE.utf8")
[1] "et_EE.utf8"
> paste(sort(c(letters,LETTERS)), collapse="")
[1] "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsZzTtUuVvWwXxYy"
[...]
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list