[R] Potential bug in readLines when reading empty lines
Ebert,Timothy Aaron
tebert @end|ng |rom u||@edu
Wed Jun 25 16:39:18 CEST 2025
The end of file is a problem. In my case I have data files that can end in one of several ways. A line can end with \r or \n.
1) No line feed at the end of the last row of data.
2) One line feed at the end of the last row of data.
3) Multiple line feeds at the end of the last row of data.
4) All of the above except with carriage return.
5) A file could end with a line feed and a carriage return.
Some of this is "self-inflicted." People can open the data files in some other program and "accidentally" add a line feed or several. They then save and close the file before sending it to me.
1) Place all files in one folder with nothing else in the folder.
2) In R get the folder from the user. I used chose.dir()
3) get a list of all files using list.files()
4) Loop through all of the files.
a) read the file in binary using readBin()
b) Identify if the file uses \r\n, \n or \r.
# This code will do the first step in counting the number of \r\n, then one removes \r\n from the file (if it exists) and counts \r and then \n.
num_crlf <- length(gregexpr("\r\n", content, fixed = TRUE)[[1]])
b) remove all \n and \r at the end of the file.
c) add one \n or \r to the end of the file as identified in 4a.
d) save file
e) end loop
The exact code will depend on what sort of files you are dealing with. Unexpected files can generate errors unless trapped for. An empty file, or a file that has been edited by multiple users.
Tim
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Heuvel, E.G. van den (Guido) via R-help
Sent: Wednesday, June 25, 2025 3:00 AM
To: 'r-help using R-project.org' <r-help using R-project.org>
Subject: [R] Potential bug in readLines when reading empty lines
[External Email]
Hi all,
I encountered some weird behaviour with readLines() recently, and I am wondering if this might be a bug, or, if it is not, how to resolve it. The issue is as follows:
If I have a text file where a line ends with just a carriage return (\r, CR) while the next line is empty and ends in a carriage return / linefeed (\r\n, CR LF), then the empty line is skipped when reading the file with readLines. The following code contains a test case:
---
print(R.version)
# platform x86_64-w64-mingw32
# arch x86_64
# os mingw32
# crt ucrt
# system x86_64, mingw32
# status
# major 4
# minor 4.0
# year 2024
# month 04
# day 24
# svn rev 86474
# language R
# version.string R version 4.4.0 (2024-04-24 ucrt)
# nickname Puppy Cup
txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n")
# Write txt_original as binary to avoid unwanted conversion of end of line markers writeBin(charToRaw(txt_original), "test.txt")
txt_actual <- readLines("test.txt")
print(txt_actual)
# [1] "Line 1" "Line 3"
---
I included the output of this script on my machine in the comments. I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), but the empty line is skipped.
Is this a bug? And if not, how should I read test.txt in such a way that the empty 2nd line is left intact?
Best regards,
Guido van den Heuvel
Statistics Netherlands
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list