[Rd] On read.csv and write.csv

Thu Jul 1 09:55:08 CEST 2021

Stephen, 

I am sure one can find a lot of small issues and inconsistencies with R and it’s standard library. It has to support a lot of legacy cruft and the design process — especially in the early days — focused on getting things done rather than delivering a standard library of immaculate quality. And it is way too late to make dramatic changes lest you want to risk breaking existing software. That ship has sailed decades ago. 

Personally, I have taught myself a while ago to always use explicit configuration when using built-in functions, and in the last couple of years I have completely replaced them in favor of other packages (such as readr) that come with (arguably) more sane defaults and better diagnostics. 

Best, 

Taras

> On 30 Jun 2021, at 23:15, Stephen Ellison <S.Ellison using LGCGroup.com> wrote:
> 
> Apologies if this is a well-worn question; I haven’t found it so far but there's a lot of r-dev and I may have missed it in the archives. In the mean time:
> 
> I've managed to avoid writing csv files with R for a couple of decades but we're swopping data with a collaborator and I've tripped over an inconsistency between read.csv and write.csv that seems less than helpful.
> The default line number behaviour for read.csv is to assume that, when the number of items in the first row is one less than the number in the second, that the first column contains row names. write.csv, however, includes an empty string ("") as the first header entry over row names when writing. On rereading, the original row names are then treated as data with unknown name, replaced by "X".
> 
> That means that, unlike read.table and write.table,  something written with write.csv is not read back correctly by read.csv .
> 
> Is that intentional?
> And whether it is intentional or not, is it wise?
> 
> Example:
> 
> ( D1 <- data.frame(A=letters[1:5], N=1:5, Y=rnorm(5) ) )
> write.csv(D1, "temp.csv")
> 
> ( D1w <- read.csv("temp.csv") )
> 
> # Note the unnecessary new X column ...
> #Tidy up
> unlink("temp.csv")
> 
> This differs from the parent .table defaults; write.table doesn’t add the extra "" column label, so the object read back with read.table does not contain an unwanted extra column.
> 
> Wouldn’t it be more sensible if write.csv() and read.csv() were consistent in the same sense as read.table and write.table?
> Or at least if there were a switch (as.read.csv=TRUE ?) to tell write.csv to omit the initial "", or vice versa?
> 
> Currently using R version 4.1.0 on Windows, but this reproduces at least as far back as 3.6 
> 
> Steve E
> 
> 
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:13}}