[Rd] gsub() hex character range problems in R-devel?
Martin Morgan
mtmorg@n@b|oc @end|ng |rom gm@||@com
Tue Jan 4 20:35:30 CET 2022
I'm not very good at character encoding / etc so this might be user error. The following code is meant to replace extended ASCII characters, in particular a non-breaking space, with "", and it works in R-4-1-branch
> R.version.string
[1] "R version 4.1.2 Patched (2022-01-04 r81445)"
> gsub("[\x7f-\xff]", "", "fo\xa0o")
[1] "foo"
but fails in R-devel
> R.version.string
[1] "R Under development (unstable) (2022-01-04 r81445)"
> gsub("[\x7f-\xff]", "", "fo\xa0o")
Error in gsub("[\177-\xff]", "", "fo\xa0o") : invalid regular expression '[-�]', reason 'Invalid character range'
In addition: Warning message:
In gsub("[\177-\xff]", "", "fo\xa0o") :
TRE pattern compilation error 'Invalid character range'
There are other oddities, too, like
> gsub("[[:alnum:]]", "", "fo\xa0o") # R-4-1-branch
[1] "\xfc\xbe\x8c\x86\x84\xbc"
> gsub("[[:alnum:]]", "", "fo\xa0o") # R-devel
[1] "<>"
The R-devel sessionInfo is
> sessionInfo()
R Under development (unstable) (2022-01-04 r81445)
Platform: x86_64-apple-darwin19.6.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib
LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.2.0
(I have built my own R on macOS; similar behavior is observed on a Linux machine)
Any hints welcome,
Martin Morgan
More information about the R-devel
mailing list