[Rd] readlines() truncates text file with Codepage 437 encoding

Martin Maechler maechler at stat.math.ethz.ch
Thu Jun 9 16:40:42 CEST 2016


I can reproduce the issue on Linux (Fedora F22),
R 3.3.0 patched of today.

Here's code for experimenting which allows to reproduce the
issue without the need for an attached file (there's a temporary
file created and removed as part of the function below) :

##---------------------------------------------------------------------------

##' @title write-binary-readLines testing
##' @param i  vector of integers in 0:255 to be used as character codes
##' @param file.name optional
##' @param encoding "437" is the one where the problem has been reported
##' @return the readLines() resulting character string with attributes
##' @author Martin Maechler
wb.readL <- function(i, file.name = tempfile("bin"), encoding = "437") {
    stopifnot(is.integer(i), 0 <= i, i <= 255,
              is.character(file.name))
    ff <- file(file.name, "wb")
    writeBin(as.raw(i), ff)
    close(ff) ; on.exit(unlink(file.name))
    ## Now read "as codepage" :
    ch <- readLines(file(file.name, encoding = encoding))
    ##    ---------                 -------------------  typically gives warning
    structure(ch,
              fSize = file.size(file.name),
              nchars = c(b = nchar(ch, "b"),
                         c = nchar(ch, "c"),
                         w = nchar(ch, "w")))
}

ii <- c(11:12, 14:255, 10L)
(cc <- wb.readL(ii))

##---------------------------------------------------------------------------


gives

> (cc <- wb.readL(ii))
[1] "\v\f\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037 !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\177ÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜ¢£¥₧ƒáíóúñѪº¿⌐¬½¼¡«»░▒▓│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀αßΓπΣσµτΦΘΩδ∞φε∩≡±≥≤⌠⌡÷≈°∙·√ⁿ"
attr(,"fSize")
[1] 245
attr(,"nchars")
  b   c   w
427 241 241
Warning message:
In readLines(file(file.name, encoding = encoding)) :
  incomplete final line found on '/tmp/RtmpaPyDyp/bin65842896d5f1'
>



More information about the R-devel mailing list