[Rd] Change in grep behavior from 1.9.0 to R-patched

Fri Jun 11 17:40:34 CEST 2004

>>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk>
>>>>>     on Fri, 11 Jun 2004 16:28:37 +0100 (BST) writes:

    BDR> This is actually PCRE.  Something is wrong with your build of R-patched
    BDR> (1.9.1 alpha, I assume): I get 84 everywhere.  You are asking for a first
    BDR> character l, then one or more characters of `word' then tmean.  In your
    BDR> example this is the same as (in a suitable locale, including C)

    BDR> length(grep("^l[A-Za-z0-9]+tmean", x, perl = TRUE, value = TRUE))
    BDR> length(grep("^l[[:alnum:]_]+tmean", x, perl = TRUE, value = TRUE))

    BDR> which each give 84.

    BDR> One issue: PCRE is locale-dependent.  Did you use the same locale for 
    BDR> each?  What happens if you force LANG=C?

For me:

- I did use the same locale for both R versions
   (LC_CTYPE=de_CH; no explicit LANG which -- as I just realize
   nowadays (for Redhat Enterprise) means "en_US.UTF-8" -- aaaargh)
- Forcing LANG=C  helps (giving 84).

and I was wrong in saying that we've upgraded PCRE between 1.9.0
and R-patched.

Still quite peculiar (same locale settings leading to different
 PCRE behavior)..
Could it be that the locale at  *BUILD* time plays a role as well?
That might explain it for me.

Martin