[Rd] readlines() truncates text file with Codepage 437 encoding
Martin Maechler
maechler at stat.math.ethz.ch
Thu Jun 9 16:40:42 CEST 2016
I can reproduce the issue on Linux (Fedora F22),
R 3.3.0 patched of today.
Here's code for experimenting which allows to reproduce the
issue without the need for an attached file (there's a temporary
file created and removed as part of the function below) :
##---------------------------------------------------------------------------
##' @title write-binary-readLines testing
##' @param i vector of integers in 0:255 to be used as character codes
##' @param file.name optional
##' @param encoding "437" is the one where the problem has been reported
##' @return the readLines() resulting character string with attributes
##' @author Martin Maechler
wb.readL <- function(i, file.name = tempfile("bin"), encoding = "437") {
stopifnot(is.integer(i), 0 <= i, i <= 255,
is.character(file.name))
ff <- file(file.name, "wb")
writeBin(as.raw(i), ff)
close(ff) ; on.exit(unlink(file.name))
## Now read "as codepage" :
ch <- readLines(file(file.name, encoding = encoding))
## --------- ------------------- typically gives warning
structure(ch,
fSize = file.size(file.name),
nchars = c(b = nchar(ch, "b"),
c = nchar(ch, "c"),
w = nchar(ch, "w")))
}
ii <- c(11:12, 14:255, 10L)
(cc <- wb.readL(ii))
##---------------------------------------------------------------------------
gives
> (cc <- wb.readL(ii))
[1] "\v\f\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037 !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\177ÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜ¢£¥₧ƒáíóúñѪº¿⌐¬½¼¡«»░▒▓│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀αßΓπΣσµτΦΘΩδ∞φε∩≡±≥≤⌠⌡÷≈°∙·√ⁿ"
attr(,"fSize")
[1] 245
attr(,"nchars")
b c w
427 241 241
Warning message:
In readLines(file(file.name, encoding = encoding)) :
incomplete final line found on '/tmp/RtmpaPyDyp/bin65842896d5f1'
>
More information about the R-devel
mailing list