[R] Unexpected behaviour of write.csv - read.csv

Rainer M Krug r.m.krug at gmail.com
Fri Jan 14 09:08:23 CET 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/13/2011 07:06 PM, Prof Brian Ripley wrote:
> On Thu, 13 Jan 2011, Duncan Murdoch wrote:
> 
>> On 11-01-13 6:26 AM, Rainer M Krug wrote:
> Hi
> 
> Assuming the following:
> 
>>>>> x<- data.frame(a=1:10, b=runif(10))
>>>>> str(x)
> 'data.frame':    10 obs. of  2 variables:
>   $ a: int  1 2 3 4 5 6 7 8 9 10
>   $ b: num  0.692 0.325 0.634 0.16 0.873 ...
>>>>> write.csv(x, "x.csv")
>>>>> x2<- read.csv("x.csv")
>>>>> str(x2)
> 'data.frame':    10 obs. of  3 variables:
>   $ X: int  1 2 3 4 5 6 7 8 9 10
>   $ a: int  1 2 3 4 5 6 7 8 9 10
>   $ b: num  0.692 0.325 0.634 0.16 0.873 ...
>>>>>
> 
> Using the two functions write.csv and read.csv, I would assume, that the
> resulting data.frame x2 be identical with x, but it has an additional
> column X, which contains the row names of x.
> 
> I know read.table and write.table which work as expected, but I would
> like to use a csv for data exchange reasons.
> 
> I know that I can use
> write.csv(x, "x.csv", row.names=FALSE)
> 
> and it would work, but shouldn't that be the default behaviour?
>>>
>>> I don't think so.  The CSV format is an export format which holds less
>>> information than a dataframe.  By exporting the dataframe to CSV and
>>> importing the result, you are discarding information and you should
>>> expect to get something different.
> 
>> You need to read it with read.csv("x.csv", row.names=1)

Thanks - that makes sense.

> 
>> Nothing in the csv format lets R know that the first column is the row
>> names (in the format used by read.table, having a header that is one
>> column short does).  Now R could guess that a .csv file with an empty
>> string for the first column name is meant to be the row names, but that
>> would be merely a guess based on one (barely documented for
>> spreadsheets) convention.

OK - accepted - assuming things which are only barely documented is the
first step towards incompatibilities - and that is the last thing would
like to have.

Just for clarification, it might be useful to state this in the help
page - or did I miss it there? - as this is an important point and
difference between write.table and write.csv.

> 
>>> If you want to save a dataframe to disk and read it back unchanged,
>>> you should use save() and load().
> 
>> Or one of the other serialization options such as serialize() and
>> .saveRDS().  R's own admin uses .saveRDS() for such purposes.

They look exactly like what I was looking for, but it says in the help page:

####################################
Details:
     Since these are internal, the file format is subject to change
     without notice.  The current format is that of ‘serialize’,
     compressed as if by ‘gzip’ if ‘compress = FALSE’.
####################################

This sounds frightening - unless, that the existing version is kept and
can be used even if the default version changes.

Cheers,

Rainer

> 
> 
>>>
>>> Duncan Murdoch
>>>
>>>
> And if this is not compliant with csv files, shouldn't the function
> read.csv convert the first column into the row names?
> 
> Cheers,
> 
> Rainer
> 
>>>
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Tel:        +33 - (0)9 53 10 27 44
Cell:       +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:      Rainer at krugs.de

Skype:      RMkrug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0wBHcACgkQoYgNqgF2egpXRQCfcmEfGcAyziEjT+Z9yr5LblMm
1fMAnRzcnlkyE27/IcMOh/Wjjum0KtZt
=as6T
-----END PGP SIGNATURE-----



More information about the R-help mailing list