[R] The behaviour of read.csv().

David Winsemius dwinsemius at comcast.net
Fri Dec 3 03:18:11 CET 2010


On Dec 2, 2010, at 8:33 PM, Duncan Murdoch wrote:

snipped
>
> I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0.  
> The comment in the NEWS file suggests it was in response to some  
> strange csv file coming out of Excel.
>
> The real problem with the CSV format is that there really isn't a  
> well defined standard for it.  The first RFC about it was published  
> in 2005, and it doesn't claim to be authoritative.  Excel is kind of  
> a standard, but it does some very weird things.  (For example:   
> enter the string 01 into a field.  To keep the leading 0, you need  
> to type it as '01.  Save the file, read it back:  goodbye 0.  At  
> least that's what a website I was just on says about Excel, and what  
> OpenOffice does.)

In both Excel and in OO,org you can select a column (or any other  
range) and set its format to text. (The default is numeric, not that  
different that read.table()'s default behavior.) Once a format has  
been set, you then do not need leading quotes. I just created a small  
example with OO.org Calc entered leading "0" without leading quotes  
and this code runs as desired after copying the three cells to the  
clipboard:

 > read.table(pipe("pbpaste"), colClasses="character")
     V1
1   01
2  004
3 0005

The same applies to date field in both OO.org and Excel. In this  
regard, it is simply a matter of understanding what is the defined  
behavior of your software and how one can manipulate it. This is no  
different than learning R's classes, coercing them to your ends, and  
dealing with other formatting issues.

>
> I've been burned so many times by storing data in .csv files, that I  
> just avoid them whenever I can.

No argument there. I know one physician whose weapon of choice is  
Stata who always uses "|" as his separator, but that's perhaps because  
he works entirely in Windows. I imagine that might not be the most  
uncommon character in *NIXen.

--

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list