[Rd] invalid regular expression '[a-Z]'
Henrik Bengtsson
hb at stat.berkeley.edu
Thu Mar 6 03:40:48 CET 2008
On Wed, Mar 5, 2008 at 6:18 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> On 05/03/2008 8:56 PM, Henrik Bengtsson wrote:
> > Hi,
> >
> > just curious, but does anyone know the source/reason of observing the
> > following error on OSX but not on WinXP and Linux?
>
> Presumably in the locale you're using on OSX, "a" < "Z" is false. This
> is the ascii sort order used in the C locale. On my Windows box, "a" <
> "Z" is true, because it uses the English_Canada.1252 collation order.
That's it indeed. The person who first reported the error had
sessionInfo() locale
'en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8' and I
missed that 'C' in the middle, which I guess his system falls back to
if none of the previous ones exist?!?
Now I can reproduce it on both Windows and Linux:
> Sys.setlocale("LC_ALL", "C")
[1] "C"
> regexpr("[a-Z]", "foo")
Error in regexpr("[a-Z]", "foo") : invalid regular expression '[a-Z]'
> Sys.setlocale("LC_ALL", "en")
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;L
C_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States
.1252"
> regexpr("[a-Z]", "foo")
[1] 1
attr(,"match.length")
[1] 1
Case almost closed, but then the question is why don't you get an
error in one of the two cases '[a-Z]' and '[A-z]' then with the other
locale(s)?
> Sys.setlocale("LC_ALL", "en")
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;L
C_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States
.1252"
> regexpr("[a-Z]", "foo")
[1] 1
attr(,"match.length")
[1] 1
> regexpr("[A-z]", "foo")
[1] 1
attr(,"match.length")
[1] 1
> "a" < "Z"
[1] TRUE
> "a" > "Z"
[1] FALSE
Thanks
/Henrik
>
> Duncan Murdoch
>
>
> I've tried with a
> > few different versions of R (v2.5.1, v2.6.1, v2.6.2, v2.7.0devel).
> > The locale does not seem to affect the error, i.e. I've tested a few
> > different and it is still only OSX that gives the error but not the
> > other two.
> >
> >> regexpr("[a-Z]", "foo")
> > Error in regexpr(pattern, text, extended, fixed, useBytes) :
> > invalid regular expression '[a-Z]'
> >> regexpr("[a-zA-Z]", "foo")
> > [1] 1
> > attr(,"match.length")
> > [1] 1
> >> regexpr("[A-z]", "foo")
> > [1] 1
> > attr(,"match.length")
> > [1] 1
> >
> > At least now I know it that the safest is to use '[a-zA-Z]' (or
> > possibly '[[:alpha:]]').
> >
> > /Henrik
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
More information about the R-devel
mailing list