[R] The behaviour of read.csv().

Fri Dec 3 02:33:24 CET 2010

On 02/12/2010 8:04 PM, Peter Ehlers wrote:
> On 2010-12-02 16:26, Rolf Turner wrote:
>>
>> On 3/12/2010, at 1:08 PM, Phil Spector wrote:
>>
>>> Rolf -
>>>      I'd suggest using
>>>
>>>       junk<- read.csv("junk.csv",header=TRUE,fill=FALSE)
>>>
>>> if you don't want the behaviour you're seeing.
>>
>>
>> The point is not that I don't want this kind of behaviour.
>> The point is that it seems to me to be unexpected and dangerous.
>>
>> I can indeed take precautions against it, now that I know about it,
>> by specifying fill=FALSE.  Given that I remember to do so.
>>
>> Now that you've pointed it out I can see that this is the reason
>> for the different behaviour between read.table() and read.csv();
>> in read.table() fill=FALSE is effectively the default.
>>
>> Having fill=TRUE being the default in read.csv() strikes me as
>> being counter-intuitive and dangerous.
>>
>
> Rolf,
> This is not to argue with your point re counter-intuitive,
> but I always run a count.fields() first if I haven't seen
> (or can't easily see) the file in my editor. I must have
> learned that the hard way a long time ago.

I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0. 
The comment in the NEWS file suggests it was in response to some strange 
csv file coming out of Excel.

The real problem with the CSV format is that there really isn't a well 
defined standard for it.  The first RFC about it was published in 2005, 
and it doesn't claim to be authoritative.  Excel is kind of a standard, 
but it does some very weird things.  (For example:  enter the string 01 
into a field.  To keep the leading 0, you need to type it as '01.  Save 
the file, read it back:  goodbye 0.  At least that's what a website I 
was just on says about Excel, and what OpenOffice does.)

I've been burned so many times by storing data in .csv files, that I 
just avoid them whenever I can.

Duncan Murdoch