[R] The behaviour of read.csv().
dwinsemius at comcast.net
Fri Dec 3 03:18:11 CET 2010
On Dec 2, 2010, at 8:33 PM, Duncan Murdoch wrote:
> I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0.
> The comment in the NEWS file suggests it was in response to some
> strange csv file coming out of Excel.
> The real problem with the CSV format is that there really isn't a
> well defined standard for it. The first RFC about it was published
> in 2005, and it doesn't claim to be authoritative. Excel is kind of
> a standard, but it does some very weird things. (For example:
> enter the string 01 into a field. To keep the leading 0, you need
> to type it as '01. Save the file, read it back: goodbye 0. At
> least that's what a website I was just on says about Excel, and what
> OpenOffice does.)
In both Excel and in OO,org you can select a column (or any other
range) and set its format to text. (The default is numeric, not that
different that read.table()'s default behavior.) Once a format has
been set, you then do not need leading quotes. I just created a small
example with OO.org Calc entered leading "0" without leading quotes
and this code runs as desired after copying the three cells to the
> read.table(pipe("pbpaste"), colClasses="character")
The same applies to date field in both OO.org and Excel. In this
regard, it is simply a matter of understanding what is the defined
behavior of your software and how one can manipulate it. This is no
different than learning R's classes, coercing them to your ends, and
dealing with other formatting issues.
> I've been burned so many times by storing data in .csv files, that I
> just avoid them whenever I can.
No argument there. I know one physician whose weapon of choice is
Stata who always uses "|" as his separator, but that's perhaps because
he works entirely in Windows. I imagine that might not be the most
uncommon character in *NIXen.
David Winsemius, MD
West Hartford, CT
More information about the R-help