[Rd] invalid regular expression '[a-Z]'

Henrik Bengtsson hb at stat.berkeley.edu
Thu Mar 6 09:52:58 CET 2008


On Wed, Mar 5, 2008 at 11:09 PM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
> On Wed, 5 Mar 2008, Henrik Bengtsson wrote:
>
>  > On Wed, Mar 5, 2008 at 6:18 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>  >> On 05/03/2008 8:56 PM, Henrik Bengtsson wrote:
>  >> > Hi,
>  >> >
>  >> > just curious, but does anyone know the source/reason of observing the
>  >> > following error on OSX but not on WinXP and Linux?
>  >>
>  >>  Presumably in the locale you're using on OSX, "a" < "Z" is false.  This
>  >>  is the ascii sort order used in the C locale.  On my Windows box, "a" <
>  >>  "Z" is true, because it uses the English_Canada.1252 collation order.
>  >
>  > That's it indeed.  The person who first reported the error had
>  > sessionInfo() locale
>  > 'en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8' and I
>  > missed that 'C' in the middle, which I guess his system falls back to
>  > if none of the previous ones exist?!?
>
>  No.  Those are settings for various categories, just as you showed for
>  Window.  The first setting appears to be LC_COLLATE, but what they mean is
>  not documented on the system man page for setlocale.
>
>  It's just that MacOS uses C collation order in English locales, even
>  though almost everyone else uses aAbB or AaBb (the latter being what the
>  English actually use, as do almost all book indices in dialects of
>  English).  But then there is no surprise that MacOS has to be different
>  ... its implementaton of locales is idiosyncratic (to be generous).
>
>  Note that even [A-Za-z] is unsafe -- as I recall Z is in the middle of the
>  alphabet in Estonian locales.  If you want alphabetic characters, use
>  [[:alpha:]].  If you want ASCII alphabetic characters, write out the
>  ranges as [AB...Zab...z]
>
>  E.g. (F8 Linux)
>
>  > Sys.setlocale("LC_COLLATE", "et_EE.utf8")
>  [1] "et_EE.utf8"
>  > paste(sort(c(letters,LETTERS)), collapse="")
>  [1] "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsZzTtUuVvWwXxYy"

Alpha and Omega - you said it all.

Thanks for the clarifications.

/Henrik

>
>
>  [...]
>
>  --
>  Brian D. Ripley,                  ripley at stats.ox.ac.uk
>  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>  University of Oxford,             Tel:  +44 1865 272861 (self)
>  1 South Parks Road,                     +44 1865 272866 (PA)
>  Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



More information about the R-devel mailing list