[Rd] Read.dcf with no newline ending: gzfile drops last line
John Muschelli
muschellij2 at gmail.com
Mon Nov 14 16:32:16 CET 2016
I don't know if this is a bug per se, but an undesired behavior in
read.dcf. read.dcf takes a file argument and passes it to gzfile if
it's a character:
if (is.character(file)) {
file <- gzfile(file)
on.exit(close(file))
}
This gzfile connection is passed to readLines (line #39):
lines <- readLines(file)
If no newline is at the end of the file, readLines doesn't give a
warning (I think appropriate behavior). If a DESCRIPTION file doesn't
happen to have a newline at the end of it (odd, but it may happen),
then the last tag is dropped:
> x = "Package: test
+ Type: Package"
>
> ######################################
> # No Newline in file
> ######################################
> fname = tempfile()
> writeLines(x, fname, sep = "")
>
> ### readlines with character - warning but all fields
> readLines(fname)
[1] "Package: test" "Type: Package"
Warning message:
In readLines(fname) :
incomplete final line found on
'/var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//Rtmpz95dsT/file180a65a6b745'
> ### readlines with file connection - warning but all fields
> file_con <- file(fname)
> readLines(file_con)
[1] "Package: test" "Type: Package"
Warning message:
In readLines(file_con) :
incomplete final line found on
'/var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//Rtmpz95dsT/file180a65a6b745'
>
> ### readlines with gzfile connection
> ## no warning and drops last field
> gz_con = gzfile(fname)
> readLines(gz_con) # ONLY 1 lines!
[1] "Package: test"
>
> ######################################
> # No Newline in file - fine
> ######################################
> ### readlines with gzfile connection
> ## no warning and drops last field but OK
> writeLines(x, fname, sep = "\n")
> gz_con = gzfile(fname)
> readLines(gz_con)
[1] "Package: test" "Type: Package"
Currently I use file(fname) before read.dcf to be sure a warning
occurs, but all fields are read. I didn't see anything in read.dcf
help about this. readLines states clearly:
"If the final line is incomplete (no final EOL marker) the behaviour
depends on whether the connection is blocking or not", but it's not
100% clear that read.dcf uses gzfile if the file is not compressed.
Thanks
John
More information about the R-devel
mailing list