[R] Potential bug in readLines when reading empty lines

Richard O'Keefe r@oknz @end|ng |rom gm@||@com
Thu Jun 26 10:25:34 CEST 2025


Why impossible?
The Smalltalk system I normally use does exactly this.
To read a line, read one character at a time until
  - the end of the stream; or
  - an LF, or
  - a CR, in which case consume the next character as well iff it is an LF.
It doesn't even have the concept of an input stream having a line end
convention.  Why would it?
(Output streams, yes.  Input streams, no.)

Given that we've had SUA then WSL in the Windows world for years now,
it's not that uncommon for a file to be edited in both a Windows
editor and a Unix editor and end up with a mix of line.  In the Apple
world we had A/UX and MachTen, so mixed LF and CR files were not that
rare either.

If R *did* guess from the first line, then the second line would not disappear.

On Thu, 26 Jun 2025 at 04:58, Jeff Newmiller via R-help
<r-help using r-project.org> wrote:
>
> Indeed, the documentation appears to be incorrect... it is literally impossible for the function to accept a connection that has been opened with arbitrary mode and provide that kind of flexibility in line end handling.
>
> The fact that the documentation proceeds to claim that when a filename is supplied it defaults to mode "rt" means that it cannot provide this functionality even in the default case... the "t" mode pushes the line end handling down to the connection level, and ?connections claims that readLines somehow bypasses the connection mode... I certainly hope this boundary crossing claim is false.
>
> I suspect that what is actually happening here is that the text connection pre-reads the beginning of the file in raw mode, identifies the most common line ending, and then opens the file assuming that all line endings will have to conform with that type. There will be a speed penalty if the code has to identify each line end as it goes, so the "havoc" of fixing the code to agree with the documentation may indeed exceed the headache of fixing the documentation to agree with the code.
>
> On June 25, 2025 7:28:40 AM PDT, "Heuvel, E.G. van den (Guido)" <g.vandenheuvel using cbs.nl> wrote:
> >As a longtime programmer myself I agree in general with your remark "[T]he fact that your file does not conform with a consistent line end mark convention means that any "automatic" identification of line end conventions is completely optional." At the same time, the documentation of readLines() specifically states
> >
> >"Whatever mode the connection is opened in, any of LF, CRLF or CR will be accepted as the EOL marker for a line."
> >
> >If I interpret this correctly, this is exactly the type of generic line end handling algorithm that you describe. However, it appears to not be working correctly.
> >
> >-----Oorspronkelijk bericht-----
> >Van: Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
> >Verzonden: woensdag 25 juni 2025 16:10
> >Aan: Heuvel, E.G. van den (Guido) <g.vandenheuvel using cbs.nl>; Heuvel, E.G. van den (Guido) via R-help <r-help using r-project.org>; 'r-help using R-project.org' <r-help using R-project.org>
> >Onderwerp: Re: [R] Potential bug in readLines when reading empty lines
> >
> >[Externe email]
> >
> >As a longtime programmer, I would say that your file is at fault... there is no programming standard that says any software needs to handle this kind of data in any defined way. More specifically, the only standards-based requirements I am aware of require the programmer to specify whether the file is a text file (per the convention drive by the OS) or a binary file. The fact that your file does not conform with a consistent line end mark convention means that any "automatic" identification of line end conventions is completely optional.
> >
> >Looking at this from the perspective of a user, I think you have two options: fix the process that is feeding you invalid data, or use binary mode to implement the parsing  behavior you wish to obtain for this file format.
> >
> >In addition, I suppose you could develop a generic line end handling algorithm that you think would resolve this and submit a suggestion/patch to R and hope someone agrees that such a change won't cause more havoc than it avoids for other users. But that would be unlikely to happen in a timely fashion for your current needs.
> >
> >On June 24, 2025 11:59:58 PM PDT, "Heuvel, E.G. van den (Guido) via R-help" <r-help using r-project.org> wrote:
> >>Hi all,
> >>
> >>I encountered some weird behaviour with readLines() recently, and I am wondering if this might be a bug, or, if it is not, how to resolve it. The issue is as follows:
> >>
> >>If I have a text file where a line ends with just a carriage return (\r, CR) while the next line is empty and ends in a carriage return / linefeed (\r\n, CR LF), then the empty line is skipped when reading the file with readLines. The following code contains a test case:
> >>
> >>---
> >>print(R.version)
> >># platform       x86_64-w64-mingw32
> >># arch           x86_64
> >># os             mingw32
> >># crt            ucrt
> >># system         x86_64, mingw32
> >># status
> >># major          4
> >># minor          4.0
> >># year           2024
> >># month          04
> >># day            24
> >># svn rev        86474
> >># language       R
> >># version.string R version 4.4.0 (2024-04-24 ucrt)
> >># nickname       Puppy Cup
> >>
> >>txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n")
> >>
> >># Write txt_original as binary to avoid unwanted conversion of end of
> >>line markers writeBin(charToRaw(txt_original), "test.txt")
> >>
> >>txt_actual <- readLines("test.txt")
> >>print(txt_actual)
> >># [1] "Line 1" "Line 3"
> >> ---
> >>
> >>I included the output of this script on my machine in the comments. I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), but the empty line is skipped.
> >>
> >>Is this a bug? And if not, how should I read test.txt in such a way that the empty 2nd line is left intact?
> >>
> >>Best regards,
> >>
> >>Guido van den Heuvel
> >>Statistics Netherlands
> >>
> >>______________________________________________
> >>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide
> >>https://www.R-project.org/posting-guide.html
> >>and provide commented, minimal, self-contained, reproducible code.
> >
> >--
> >Sent from my phone. Please excuse my brevity.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list