[Rd] pb in regular expression with the character "-" (PR#9437)

Herve Pages hpages at fhcrc.org
Sat Jan 6 05:41:48 CET 2007


Hi all,

maechler at stat.math.ethz.ch wrote:
> 
> Consider my guesstimate:
> For 99% of all R users, the amount of time they need working
> pretty intensely with R before they find a bug in it, 
> is nowadays more than three years, and maybe even much more
> -- such as their lifetime :-)

Perhaps I belong to the 1% of unlucky users that don't have to
wait that long ;-)

  > nchar("éA", type = "bytes")
  [1] 3
  > nchar("éA", type = "chars")
  [1] 2

  OK

Now:

  > regexpr("A", "éA")
  [1] 2
  attr(,"match.length")
  [1] 1

  still OK

But:

  > regexpr("A", "éA", useBytes=TRUE)
  [1] 2
  attr(,"match.length")
  [1] 1

  not OK anymore (3 expected, not 2)

Let's try with fixed=TRUE:

  > regexpr("A", "éA", useBytes=TRUE, fixed=TRUE)
  [1] 3
  attr(,"match.length")
  [1] 1

  much better!

H.


> sessionInfo()
R version 2.5.0 Under development (unstable) (2007-01-05 r40386)
i686-pc-linux-gnu

locale:
LC_CTYPE=en_CA.UTF-8;LC_NUMERIC=C;LC_TIME=en_CA.UTF-8;LC_COLLATE=en_CA.UTF-8;LC_MONETARY=en_CA.UTF-8;LC_MESSAGES=en_CA.UTF-8;LC_PAPER=en_CA.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_CA.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
[7] "base"

but this happens also in 2.4.0 and 2.4.1.



More information about the R-devel mailing list