[R] Potential bug in readLines when reading empty lines

Ebert,Timothy Aaron tebert @end|ng |rom u||@edu
Wed Jun 25 16:39:18 CEST 2025


The end of file is a problem. In my case I have data files that can end in one of several ways. A line can end with \r or \n.
1) No line feed at the end of the last row of data.
2) One line feed at the end of the last row of data.
3) Multiple line feeds at the end of the last row of data.
4) All of the above except with carriage return.
5) A file could end with a line feed and a carriage return.
Some of this is "self-inflicted." People can open the data files in some other program and "accidentally" add a line feed or several. They then save and close the file before sending it to me.

1) Place all files in one folder with nothing else in the folder.
2) In R get the folder from the user. I used chose.dir()
3) get a list of all files using list.files()
4) Loop through all of the files.
        a) read the file in binary using readBin()
        b) Identify if the file uses \r\n,  \n or \r.
                # This code will do the first step in counting the number of \r\n, then one removes \r\n from the file (if it exists) and counts \r and then \n.
                  num_crlf <- length(gregexpr("\r\n", content, fixed = TRUE)[[1]])
        b) remove all \n and \r at the end of the file.
        c) add one \n or \r to the end of the file as identified in 4a.
        d) save file
        e) end loop

The exact code will depend on what sort of files you are dealing with. Unexpected files can generate errors unless trapped for. An empty file, or a file that has been edited by multiple users.

Tim

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Heuvel, E.G. van den (Guido) via R-help
Sent: Wednesday, June 25, 2025 3:00 AM
To: 'r-help using R-project.org' <r-help using R-project.org>
Subject: [R] Potential bug in readLines when reading empty lines

[External Email]

Hi all,

I encountered some weird behaviour with readLines() recently, and I am wondering if this might be a bug, or, if it is not, how to resolve it. The issue is as follows:

If I have a text file where a line ends with just a carriage return (\r, CR) while the next line is empty and ends in a carriage return / linefeed (\r\n, CR LF), then the empty line is skipped when reading the file with readLines. The following code contains a test case:

---
print(R.version)
# platform       x86_64-w64-mingw32
# arch           x86_64
# os             mingw32
# crt            ucrt
# system         x86_64, mingw32
# status
# major          4
# minor          4.0
# year           2024
# month          04
# day            24
# svn rev        86474
# language       R
# version.string R version 4.4.0 (2024-04-24 ucrt)
# nickname       Puppy Cup

txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n")

# Write txt_original as binary to avoid unwanted conversion of end of line markers writeBin(charToRaw(txt_original), "test.txt")

txt_actual <- readLines("test.txt")
print(txt_actual)
# [1] "Line 1" "Line 3"
 ---

I included the output of this script on my machine in the comments. I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), but the empty line is skipped.

Is this a bug? And if not, how should I read test.txt in such a way that the empty 2nd line is left intact?

Best regards,

Guido van den Heuvel
Statistics Netherlands

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list