[R] The behaviour of read.csv().
David Scott
d.scott at auckland.ac.nz
Fri Dec 3 03:48:56 CET 2010
On 03/12/10 14:33, Duncan Murdoch wrote:
> On 02/12/2010 8:04 PM, Peter Ehlers wrote:
>> On 2010-12-02 16:26, Rolf Turner wrote:
>>> On 3/12/2010, at 1:08 PM, Phil Spector wrote:
>>>
>>>> Rolf -
>>>> I'd suggest using
>>>>
>>>> junk<- read.csv("junk.csv",header=TRUE,fill=FALSE)
>>>>
>>>> if you don't want the behaviour you're seeing.
>>>
>>> The point is not that I don't want this kind of behaviour.
>>> The point is that it seems to me to be unexpected and dangerous.
>>>
>>> I can indeed take precautions against it, now that I know about it,
>>> by specifying fill=FALSE. Given that I remember to do so.
>>>
>>> Now that you've pointed it out I can see that this is the reason
>>> for the different behaviour between read.table() and read.csv();
>>> in read.table() fill=FALSE is effectively the default.
>>>
>>> Having fill=TRUE being the default in read.csv() strikes me as
>>> being counter-intuitive and dangerous.
>>>
>> Rolf,
>> This is not to argue with your point re counter-intuitive,
>> but I always run a count.fields() first if I haven't seen
>> (or can't easily see) the file in my editor. I must have
>> learned that the hard way a long time ago.
> I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0.
> The comment in the NEWS file suggests it was in response to some strange
> csv file coming out of Excel.
>
> The real problem with the CSV format is that there really isn't a well
> defined standard for it. The first RFC about it was published in 2005,
> and it doesn't claim to be authoritative. Excel is kind of a standard,
> but it does some very weird things. (For example: enter the string 01
> into a field. To keep the leading 0, you need to type it as '01. Save
> the file, read it back: goodbye 0. At least that's what a website I
> was just on says about Excel, and what OpenOffice does.)
>
> I've been burned so many times by storing data in .csv files, that I
> just avoid them whenever I can.
Absolutely agree with this Duncan. Playing around with .csv files is
like playing with some sort of unstable explosive. I also avoid them as
much as possible.
David Scott
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
_________________________________________________________________
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142, NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email: d.scott at auckland.ac.nz, Fax: +64 9 373 7018
Director of Consulting, Department of Statistics
More information about the R-help
mailing list