[R] Unexpected behaviour of write.csv - read.csv
Gabor Grothendieck
ggrothendieck at gmail.com
Thu Jan 13 19:30:42 CET 2011
On Thu, Jan 13, 2011 at 1:06 PM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
> On Thu, 13 Jan 2011, Duncan Murdoch wrote:
>
>> On 11-01-13 6:26 AM, Rainer M Krug wrote:
>>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi
>>>
>>> Assuming the following:
>>>
>>>> x<- data.frame(a=1:10, b=runif(10))
>>>> str(x)
>>>
>>> 'data.frame': 10 obs. of 2 variables:
>>> $ a: int 1 2 3 4 5 6 7 8 9 10
>>> $ b: num 0.692 0.325 0.634 0.16 0.873 ...
>>>>
>>>> write.csv(x, "x.csv")
>>>> x2<- read.csv("x.csv")
>>>> str(x2)
>>>
>>> 'data.frame': 10 obs. of 3 variables:
>>> $ X: int 1 2 3 4 5 6 7 8 9 10
>>> $ a: int 1 2 3 4 5 6 7 8 9 10
>>> $ b: num 0.692 0.325 0.634 0.16 0.873 ...
>>>>
>>>
>>> Using the two functions write.csv and read.csv, I would assume, that the
>>> resulting data.frame x2 be identical with x, but it has an additional
>>> column X, which contains the row names of x.
>>>
>>> I know read.table and write.table which work as expected, but I would
>>> like to use a csv for data exchange reasons.
>>>
>>> I know that I can use
>>> write.csv(x, "x.csv", row.names=FALSE)
>>>
>>> and it would work, but shouldn't that be the default behaviour?
>>
>> I don't think so. The CSV format is an export format which holds less
>> information than a dataframe. By exporting the dataframe to CSV and
>> importing the result, you are discarding information and you should expect
>> to get something different.
>
> You need to read it with read.csv("x.csv", row.names=1)
>
> Nothing in the csv format lets R know that the first column is the row names
> (in the format used by read.table, having a header that is one column short
> does). Now R could guess that a .csv file with an empty string for the
> first column name is meant to be the row names, but that would be merely a
> guess based on one (barely documented for spreadsheets) convention.
read.csv / read.table already use heuristics to determine the column
types so adding this to the heuristic seems not to be a departure from
the established philosophy.
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list