[R] Potential bug in readLines when reading empty lines

Enrico Schumann e@ @end|ng |rom enr|co@chum@nn@net
Wed Jun 25 17:17:37 CEST 2025


Quoting "Heuvel, E.G. van den (Guido)" <g.vandenheuvel using cbs.nl>:

> -----Oorspronkelijk bericht-----
> Van: Enrico Schumann <es using enricoschumann.net>
> Verzonden: woensdag 25 juni 2025 16:44
> Aan: Heuvel, E.G. van den (Guido) <g.vandenheuvel using cbs.nl>
> CC: 'r-help using R-project.org' <r-help using r-project.org>
> Onderwerp: Re: [R] Potential bug in readLines when reading empty lines
>
>
> Quoting "Heuvel, E.G. van den (Guido) via R-help" <r-help using r-project.org>:
>
>> Hi all,
>>
>> I encountered some weird behaviour with readLines() recently, and I >
>> am wondering if this might be a bug, or, if it is not, how to > resolve
>> it. The issue is as follows:
>>
>> If I have a text file where a line ends with just a carriage return >
>> (\r,
>> CR) while the next line is empty and ends in a carriage return > /
>> linefeed (\r\n, CR LF), then the empty line is skipped when > reading
>> the file with readLines. The following code contains a test > case:
>>
>> ---
>> print(R.version)
>> # platform       x86_64-w64-mingw32
>> # arch           x86_64
>> # os             mingw32
>> # crt            ucrt
>> # system         x86_64, mingw32
>> # status
>> # major          4
>> # minor          4.0
>> # year           2024
>> # month          04
>> # day            24
>> # svn rev        86474
>> # language       R
>> # version.string R version 4.4.0 (2024-04-24 ucrt)
>> # nickname       Puppy Cup
>>
>> txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n")
>>
>> # Write txt_original as binary to avoid unwanted conversion of end >
>> of line markers  writeBin(charToRaw(txt_original), "test.txt")
>>
>> txt_actual <- readLines("test.txt")
>> print(txt_actual)
>> # [1] "Line 1" "Line 3"
>>  ---
>>
>> I included the output of this script on my machine in the comments. >
>> I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), >
>> but the empty line is skipped.
>>
>> Is this a bug? And if not, how should I read test.txt in such a way >
>> that the empty 2nd line is left intact?
>>
>> Best regards,
>>
>> Guido van den Heuvel
>> Statistics Netherlands
>
> What would be your "rule" for identifying lines? From your desired  
> output, it seems \r should be end-of-line, and \n is to be ignored.  
> Then you could do something like that:
>
>    raw <- readChar("test.txt", 1000)
>    raw <- gsub("\n", "", raw)
>    strsplit(raw, "\r")[[1]]
>    ## [1] "Line 1" ""       "Line 3"
>
> But it requires you to specify the number of characters to read (or  
> write a loop).
>
>
> My preferred rule would be the current documentation of the  
> readLines function. Specifically, the line "Whatever mode the  
> connection is opened in, any of LF, CRLF or CR will be accepted as  
> the EOL marker for a line."

As a workaround, you could do something like this:

   raw <- readChar("test.txt", 1000)
   raw <- gsub("\r\n", "\n", raw)
   raw <- gsub("\r", "\n", raw)
   strsplit(raw, "\n")[[1]]
   ## [1] "Line 1" ""       "Line 3"


Of course, if the true file was created with

   txt <- paste0("Line 1\r", "\n", "\r\n", "Line 4\r")

then "Line 1\r" and "\n"  will be merged into one line.



-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net



More information about the R-help mailing list