[R] Potential bug in readLines when reading empty lines

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Wed Jun 25 17:31:12 CEST 2025


On 2025-06-25 10:10 a.m., Jeff Newmiller via R-help wrote:
> As a longtime programmer, I would say that your file is at fault... there is no programming standard that says any software needs to handle this kind of data in any defined way.

The documentation for readLines() says this:

"Whatever mode the connection is opened in, any of LF, CRLF or CR will 
be accepted as the EOL marker for a line."

Perhaps that needs to be clarified to say that all lines in the file 
need to use the same EOL marker for consistent results.

Duncan Murdoch

  More specifically, the only standards-based requirements I am aware of 
require the programmer to specify whether the file is a text file (per 
the convention drive by the OS) or a binary file. The fact that your 
file does not conform with a consistent line end mark convention means 
that any "automatic" identification of line end conventions is 
completely optional.
> 
> Looking at this from the perspective of a user, I think you have two options: fix the process that is feeding you invalid data, or use binary mode to implement the parsing  behavior you wish to obtain for this file format.
> 
> In addition, I suppose you could develop a generic line end handling algorithm that you think would resolve this and submit a suggestion/patch to R and hope someone agrees that such a change won't cause more havoc than it avoids for other users. But that would be unlikely to happen in a timely fashion for your current needs.
> 
> On June 24, 2025 11:59:58 PM PDT, "Heuvel, E.G. van den (Guido) via R-help" <r-help using r-project.org> wrote:
>> Hi all,
>>
>> I encountered some weird behaviour with readLines() recently, and I am wondering if this might be a bug, or, if it is not, how to resolve it. The issue is as follows:
>>
>> If I have a text file where a line ends with just a carriage return (\r, CR) while the next line is empty and ends in a carriage return / linefeed (\r\n, CR LF), then the empty line is skipped when reading the file with readLines. The following code contains a test case:
>>
>> ---
>> print(R.version)
>> # platform       x86_64-w64-mingw32
>> # arch           x86_64
>> # os             mingw32
>> # crt            ucrt
>> # system         x86_64, mingw32
>> # status
>> # major          4
>> # minor          4.0
>> # year           2024
>> # month          04
>> # day            24
>> # svn rev        86474
>> # language       R
>> # version.string R version 4.4.0 (2024-04-24 ucrt)
>> # nickname       Puppy Cup
>>
>> txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n")
>>
>> # Write txt_original as binary to avoid unwanted conversion of end of line markers
>> writeBin(charToRaw(txt_original), "test.txt")
>>
>> txt_actual <- readLines("test.txt")
>> print(txt_actual)
>> # [1] "Line 1" "Line 3"
>> ---
>>
>> I included the output of this script on my machine in the comments. I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), but the empty line is skipped.
>>
>> Is this a bug? And if not, how should I read test.txt in such a way that the empty 2nd line is left intact?
>>
>> Best regards,
>>
>> Guido van den Heuvel
>> Statistics Netherlands
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list