[R] Potential bug in readLines when reading empty lines
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Jun 25 20:57:45 CEST 2025
Às 15:43 de 25/06/2025, Enrico Schumann escreveu:
>
> Quoting "Heuvel, E.G. van den (Guido) via R-help" <r-help using r-project.org>:
>
>> Hi all,
>>
>> I encountered some weird behaviour with readLines() recently, and I am
>> wondering if this might be a bug, or, if it is not, how to resolve it.
>> The issue is as follows:
>>
>> If I have a text file where a line ends with just a carriage return
>> (\r, CR) while the next line is empty and ends in a carriage return /
>> linefeed (\r\n, CR LF), then the empty line is skipped when reading
>> the file with readLines. The following code contains a test case:
>>
>> ---
>> print(R.version)
>> # platform x86_64-w64-mingw32
>> # arch x86_64
>> # os mingw32
>> # crt ucrt
>> # system x86_64, mingw32
>> # status
>> # major 4
>> # minor 4.0
>> # year 2024
>> # month 04
>> # day 24
>> # svn rev 86474
>> # language R
>> # version.string R version 4.4.0 (2024-04-24 ucrt)
>> # nickname Puppy Cup
>>
>> txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n")
>>
>> # Write txt_original as binary to avoid unwanted conversion of end of
>> line markers
>> writeBin(charToRaw(txt_original), "test.txt")
>>
>> txt_actual <- readLines("test.txt")
>> print(txt_actual)
>> # [1] "Line 1" "Line 3"
>> ---
>>
>> I included the output of this script on my machine in the comments. I
>> would expect txt_actual to be equal to c("Line 1", "", "Line 3"), but
>> the empty line is skipped.
>>
>> Is this a bug? And if not, how should I read test.txt in such a way
>> that the empty 2nd line is left intact?
>>
>> Best regards,
>>
>> Guido van den Heuvel
>> Statistics Netherlands
>
> What would be your "rule" for identifying lines? From your desired output,
> it seems \r should be end-of-line, and \n is to be ignored. Then you could
> do something like that:
>
> raw <- readChar("test.txt", 1000)
> raw <- gsub("\n", "", raw)
> strsplit(raw, "\r")[[1]]
> ## [1] "Line 1" "" "Line 3"
>
> But it requires you to specify the number of characters to read (or write a
> loop).
>
>
Hello,
Related, output is a mess:
readChar("test.txt", n = file.size("test.txt")) |>
textConnection() |>
readLines()
#> [1] "Line 1" "" "" "Line 3" ""
Is this specific to Windows and to the way it treats "\r"? When I open
the file in Notepad I only see two text lines, but when I open it with
vim, it's
Line 1^M
Line 3
the carriage return is there.
(And when I paste it here
Line 1
Line 3
the <Ctrl+M> becomes an empty line.)
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list