[R] The behaviour of read.csv().

Duncan Murdoch murdoch.duncan at gmail.com
Fri Dec 3 03:33:28 CET 2010

On 02/12/2010 9:18 PM, David Winsemius wrote:
> On Dec 2, 2010, at 8:33 PM, Duncan Murdoch wrote:
> snipped
>> I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0.
>> The comment in the NEWS file suggests it was in response to some
>> strange csv file coming out of Excel.
>> The real problem with the CSV format is that there really isn't a
>> well defined standard for it.  The first RFC about it was published
>> in 2005, and it doesn't claim to be authoritative.  Excel is kind of
>> a standard, but it does some very weird things.  (For example:
>> enter the string 01 into a field.  To keep the leading 0, you need
>> to type it as '01.  Save the file, read it back:  goodbye 0.  At
>> least that's what a website I was just on says about Excel, and what
>> OpenOffice does.)
> In both Excel and in OO,org you can select a column (or any other
> range) and set its format to text. (The default is numeric, not that
> different that read.table()'s default behavior.) Once a format has
> been set, you then do not need leading quotes. I just created a small
> example with OO.org Calc entered leading "0" without leading quotes
> and this code runs as desired after copying the three cells to the
> clipboard:
>   >  read.table(pipe("pbpaste"), colClasses="character")
>       V1
> 1   01
> 2  004
> 3 0005
> The same applies to date field in both OO.org and Excel. In this
> regard, it is simply a matter of understanding what is the defined
> behavior of your software and how one can manipulate it. This is no
> different than learning R's classes, coercing them to your ends, and
> dealing with other formatting issues.

You're right, I shouldn't have picked on Excel particularly here, but it 
really is a bizarre format that says the default way to read a file 


is to assume that the column contains numeric values.  (Yes, read.csv() 
makes this same assumption.)  My main complaint is with the format.

Duncan Murdoch

>> I've been burned so many times by storing data in .csv files, that I
>> just avoid them whenever I can.
> No argument there. I know one physician whose weapon of choice is
> Stata who always uses "|" as his separator, but that's perhaps because
> he works entirely in Windows. I imagine that might not be the most
> uncommon character in *NIXen.
> --
> David Winsemius, MD
> West Hartford, CT

More information about the R-help mailing list