[Rd] pb in regular expression with the character "-" (PR#9437)
Herve Pages
hpages at fhcrc.org
Sat Jan 6 05:41:48 CET 2007
Hi all,
maechler at stat.math.ethz.ch wrote:
>
> Consider my guesstimate:
> For 99% of all R users, the amount of time they need working
> pretty intensely with R before they find a bug in it,
> is nowadays more than three years, and maybe even much more
> -- such as their lifetime :-)
Perhaps I belong to the 1% of unlucky users that don't have to
wait that long ;-)
> nchar("éA", type = "bytes")
[1] 3
> nchar("éA", type = "chars")
[1] 2
OK
Now:
> regexpr("A", "éA")
[1] 2
attr(,"match.length")
[1] 1
still OK
But:
> regexpr("A", "éA", useBytes=TRUE)
[1] 2
attr(,"match.length")
[1] 1
not OK anymore (3 expected, not 2)
Let's try with fixed=TRUE:
> regexpr("A", "éA", useBytes=TRUE, fixed=TRUE)
[1] 3
attr(,"match.length")
[1] 1
much better!
H.
> sessionInfo()
R version 2.5.0 Under development (unstable) (2007-01-05 r40386)
i686-pc-linux-gnu
locale:
LC_CTYPE=en_CA.UTF-8;LC_NUMERIC=C;LC_TIME=en_CA.UTF-8;LC_COLLATE=en_CA.UTF-8;LC_MONETARY=en_CA.UTF-8;LC_MESSAGES=en_CA.UTF-8;LC_PAPER=en_CA.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_CA.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[7] "base"
but this happens also in 2.4.0 and 2.4.1.
More information about the R-devel
mailing list