[Rd] cat(s, file): infinite loop of "invalid char string in output conversion" warnings with UTF-8 encoding
Henrik Bengtsson
henrik.bengtsson at gmail.com
Wed Jan 4 04:02:15 CET 2017
The below code snippet gives a single warning:
Warning message:
In cat(s, file = tempfile()) : invalid char string in output conversion
when n <= 10001, whereas with n >= 10002 it appears to be generating
the same warning in an infinite loop in the call to cat().
n <- 10002L
r <- raw(length = n)
r[] <- charToRaw(" ")
r[length(r)] <- as.raw(0xa9)
s <- rawToChar(r)
message("Encoding: native.enc")
options(encoding = "native.enc")
cat(s, file = tempfile())
message("Encoding: UTF-8")
options(encoding = "UTF-8")
cat(s, file = tempfile())
message("DONE")
Here cat() never returns. The R process runs at 100% CPU, it does not
appear to increase it's memory usage, and the call can be interrupted:
^C
There were 50 or more warnings (use warnings() to see the first 50)
> traceback()
8: "factor" %in% attrib[["class", exact = TRUE]]
7: structure(list(message = as.character(message), call = call),
class = class)
6: simpleWarning(msg, call)
5: doWithOneRestart(return(expr), restart)
4: withOneRestart(expr, restarts[[1L]])
3: withRestarts({
.Internal(.signalCondition(simpleWarning(msg, call), msg,
call))
.Internal(.dfltWarn(msg, call))
}, muffleWarning = function() NULL)
2: .signalSimpleWarning("invalid char string in output conversion",
quote(cat(s, file = tempfile())))
1: cat(s, file = tempfile())
## SOME TROUBLESHOOTING
Using options(warn = 1) shows that the "invalid char string in output
conversion" warning is outputted over and over in an infinite loop.
This warning is generated by dummy_vfprintf() defined in
src/main/connections.c
(https://github.com/wch/r-source/blob/R-3-3-branch/src/main/connections.c#L370);
# define BUFSIZE 10000
int dummy_vfprintf(Rconnection con, const char *format, va_list ap)
{
[...]
if(ires == (size_t)(-1) && errno != E2BIG)
/* is this safe? */
warning(_("invalid char string in output conversion"));
[...]
}
Note BUFSIZE, note the comment /* is this safe? */ (by Brian Ripley on
2005-01-05).
## SESSION DETAILS
I can reproduce this on R 2.11.0, R 3.3.2 and R devel on Linux. It
does not occur on R 3.3.2 for Windows under Linux Wine.
> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-unknown-linux-gnu
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
> sessionInfo()
R Under development (unstable) (2017-01-02 r71875)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.0
More information about the R-devel
mailing list