[R] Unexpected behaviour of write.csv - read.csv

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jan 13 19:06:31 CET 2011


On Thu, 13 Jan 2011, Duncan Murdoch wrote:

> On 11-01-13 6:26 AM, Rainer M Krug wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> Hi
>> 
>> Assuming the following:
>> 
>>> x<- data.frame(a=1:10, b=runif(10))
>>> str(x)
>> 'data.frame':	10 obs. of  2 variables:
>>   $ a: int  1 2 3 4 5 6 7 8 9 10
>>   $ b: num  0.692 0.325 0.634 0.16 0.873 ...
>>> write.csv(x, "x.csv")
>>> x2<- read.csv("x.csv")
>>> str(x2)
>> 'data.frame':	10 obs. of  3 variables:
>>   $ X: int  1 2 3 4 5 6 7 8 9 10
>>   $ a: int  1 2 3 4 5 6 7 8 9 10
>>   $ b: num  0.692 0.325 0.634 0.16 0.873 ...
>>> 
>> 
>> Using the two functions write.csv and read.csv, I would assume, that the
>> resulting data.frame x2 be identical with x, but it has an additional
>> column X, which contains the row names of x.
>> 
>> I know read.table and write.table which work as expected, but I would
>> like to use a csv for data exchange reasons.
>> 
>> I know that I can use
>> write.csv(x, "x.csv", row.names=FALSE)
>> 
>> and it would work, but shouldn't that be the default behaviour?
>
> I don't think so.  The CSV format is an export format which holds less 
> information than a dataframe.  By exporting the dataframe to CSV and 
> importing the result, you are discarding information and you should expect to 
> get something different.

You need to read it with read.csv("x.csv", row.names=1)

Nothing in the csv format lets R know that the first column is the row 
names (in the format used by read.table, having a header that is one 
column short does).  Now R could guess that a .csv file with an empty 
string for the first column name is meant to be the row names, but 
that would be merely a guess based on one (barely documented for 
spreadsheets) convention.

> If you want to save a dataframe to disk and read it back unchanged, you 
> should use save() and load().

Or one of the other serialization options such as serialize() and 
.saveRDS().  R's own admin uses .saveRDS() for such purposes.


>
> Duncan Murdoch
>
>
>> And if this is not compliant with csv files, shouldn't the function
>> read.csv convert the first column into the row names?
>> 
>> Cheers,
>> 
>> Rainer
>> 
>> - --
>> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
>> Biology, UCT), Dipl. Phys. (Germany)
>> 
>> Centre of Excellence for Invasion Biology
>> Natural Sciences Building
>> Office Suite 2039
>> Stellenbosch University
>> Main Campus, Merriman Avenue
>> Stellenbosch
>> South Africa
>> 
>> Tel:        +33 - (0)9 53 10 27 44
>> Cell:       +27 - (0)8 39 47 90 42
>> Fax (SA):   +27 - (0)8 65 16 27 82
>> Fax (D) :   +49 - (0)3 21 21 25 22 44
>> Fax (FR):   +33 - (0)9 58 10 27 44
>> email:      Rainer at krugs.de
>> 
>> Skype:      RMkrug
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.10 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>> 
>> iEYEARECAAYFAk0u4X8ACgkQoYgNqgF2egrLIgCeIqAevHGcOAK56qPcpNJ+vWav
>> iF0An2pk1RsY1GLJbvdMHG7FFpx437gB
>> =d5aG
>> -----END PGP SIGNATURE-----
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list