[Rd] cat(s, file): infinite loop of "invalid char string in output conversion" warnings with UTF-8 encoding

Henrik Bengtsson henrik.bengtsson at gmail.com
Wed Jan 4 04:02:15 CET 2017


The below code snippet gives a single warning:

  Warning message:
  In cat(s, file = tempfile()) : invalid char string in output conversion

when n <= 10001, whereas with n >= 10002 it appears to be generating
the same warning in an infinite loop in the call to cat().

n <- 10002L

r <- raw(length = n)
r[] <- charToRaw(" ")
r[length(r)] <- as.raw(0xa9)
s <- rawToChar(r)

message("Encoding: native.enc")
options(encoding = "native.enc")
cat(s, file = tempfile())

message("Encoding: UTF-8")
options(encoding = "UTF-8")
cat(s, file = tempfile())

message("DONE")


Here cat() never returns. The R process runs at 100% CPU, it does not
appear to increase it's memory usage, and the call can be interrupted:

^C
There were 50 or more warnings (use warnings() to see the first 50)
> traceback()
8: "factor" %in% attrib[["class", exact = TRUE]]
7: structure(list(message = as.character(message), call = call),
       class = class)
6: simpleWarning(msg, call)
5: doWithOneRestart(return(expr), restart)
4: withOneRestart(expr, restarts[[1L]])
3: withRestarts({
       .Internal(.signalCondition(simpleWarning(msg, call), msg,
           call))
       .Internal(.dfltWarn(msg, call))
   }, muffleWarning = function() NULL)
2: .signalSimpleWarning("invalid char string in output conversion",
       quote(cat(s, file = tempfile())))
1: cat(s, file = tempfile())


## SOME TROUBLESHOOTING

Using options(warn = 1) shows that the "invalid char string in output
conversion" warning is outputted over and over in an infinite loop.
This warning is generated by dummy_vfprintf() defined in
src/main/connections.c
(https://github.com/wch/r-source/blob/R-3-3-branch/src/main/connections.c#L370);

# define BUFSIZE 10000
int dummy_vfprintf(Rconnection con, const char *format, va_list ap)
{
    [...]
    if(ires == (size_t)(-1) && errno != E2BIG)
        /* is this safe? */
        warning(_("invalid char string in output conversion"));
    [...]
}

Note BUFSIZE, note the comment /* is this safe? */ (by Brian Ripley on
2005-01-05).



## SESSION DETAILS

I can reproduce this on R 2.11.0, R 3.3.2 and R devel on Linux.  It
does not occur on R 3.3.2 for Windows under Linux Wine.

> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-unknown-linux-gnu

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base


> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

> sessionInfo()
R Under development (unstable) (2017-01-02 r71875)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.0



More information about the R-devel mailing list