[Rd] Issue with Control-Z in a text file on Windows - readLines() appears to truncate
Sean O'Riordain
seanpor at acm.org
Wed Apr 10 16:20:21 CEST 2013
Working on Windows I have had to deal with CSV files that,
unfortunately, contain embedded Control-Zs, i.e. ASCII character 26 in
decimal, and the readLines() function in R on Windows (2.15.2 and
3.0.0) appears to truncate at the control-Z. There is no problem at
all on Ubuntu Linux with R 3.0.0.
Am I mistaken or is this genuine?
# Create a small file with embedded Control-Z
h3 <- paste('1,34,44.4,"', rawToChar(as.raw(c(65, 26, 65))), '",99')
h3
# "1,34,44.4,\" A\032A \",99"
writeLines(h3, 'h3.txt')
# now attempt to read the file back in
h3a <- readLines('h3.txt')
# but on Windows 2.15.2 and 3.0.0 I get the message
#Warning message:
#In readLines("h3.txt") : incomplete final line found on 'h3.txt'
h3a
# [1] "1,34,44.4,\" A"
# so it drops from the Control-Z onwards
####
# The following is my rough and ready workaround - I'm sure there is a
cleaner way
fnam <- 'h3.txt'
tmp.bin <- readBin(fnam, raw(), size=1, n=max(2*file.info(fnam)$size, 100))
tmp.char <- rawToChar(tmp.bin)
txt <- unlist(strsplit(tmp.char, '\r\n', fixed=TRUE))
txt
# [1] "1,34,44.4,\" A\032A \",99"
This was on 64-bit R on a 64-bit Windows 7, but it also appears to be
the case in a 32-bit R 2.15.2 on 32-bit Windows-7 inside in a
VirtualBox.
Kind regards,
Sean O'Riordain
Trinity College
Dublin
More information about the R-devel
mailing list