[Rd] Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
Winston Chang
winstonchang1 at gmail.com
Mon Mar 2 20:14:18 CET 2015
On Windows, grep(fixed=TRUE) throws errors with some UTF-8 strings.
Here's an example (must be run on Windows to reproduce the error):
Sys.setlocale("LC_CTYPE", "chinese")
y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97)))
Encoding(y) <- "UTF-8"
y
# [1] "渗"
grep("\n", y, fixed = TRUE)
# Error in grep("\n", y, fixed = TRUE) : invalid multibyte string at '<97>'
In my particular case, I'm using parse() on a string that contains
characters like this, and it triggers the same error, because parse()
calls srcfilecopy(), which calls grepl():
parse(text=y)
# Error in grepl("\n", lines, fixed = TRUE) :
# invalid multibyte string at '<97>'
Am I right in assuming that this isn't the expected behavior?
-Winston
More information about the R-devel
mailing list