[Rd] Change in grep behavior from 1.9.0 to R-patched
Martin Maechler
maechler at stat.math.ethz.ch
Fri Jun 11 17:40:34 CEST 2004
>>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk>
>>>>> on Fri, 11 Jun 2004 16:28:37 +0100 (BST) writes:
BDR> This is actually PCRE. Something is wrong with your build of R-patched
BDR> (1.9.1 alpha, I assume): I get 84 everywhere. You are asking for a first
BDR> character l, then one or more characters of `word' then tmean. In your
BDR> example this is the same as (in a suitable locale, including C)
BDR> length(grep("^l[A-Za-z0-9]+tmean", x, perl = TRUE, value = TRUE))
BDR> length(grep("^l[[:alnum:]_]+tmean", x, perl = TRUE, value = TRUE))
BDR> which each give 84.
BDR> One issue: PCRE is locale-dependent. Did you use the same locale for
BDR> each? What happens if you force LANG=C?
For me:
- I did use the same locale for both R versions
(LC_CTYPE=de_CH; no explicit LANG which -- as I just realize
nowadays (for Redhat Enterprise) means "en_US.UTF-8" -- aaaargh)
- Forcing LANG=C helps (giving 84).
and I was wrong in saying that we've upgraded PCRE between 1.9.0
and R-patched.
Still quite peculiar (same locale settings leading to different
PCRE behavior)..
Could it be that the locale at *BUILD* time plays a role as well?
That might explain it for me.
Martin
More information about the R-devel
mailing list