[R] Unexpected behaviour of write.csv - read.csv
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Jan 13 19:06:31 CET 2011
On Thu, 13 Jan 2011, Duncan Murdoch wrote:
> On 11-01-13 6:26 AM, Rainer M Krug wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi
>>
>> Assuming the following:
>>
>>> x<- data.frame(a=1:10, b=runif(10))
>>> str(x)
>> 'data.frame': 10 obs. of 2 variables:
>> $ a: int 1 2 3 4 5 6 7 8 9 10
>> $ b: num 0.692 0.325 0.634 0.16 0.873 ...
>>> write.csv(x, "x.csv")
>>> x2<- read.csv("x.csv")
>>> str(x2)
>> 'data.frame': 10 obs. of 3 variables:
>> $ X: int 1 2 3 4 5 6 7 8 9 10
>> $ a: int 1 2 3 4 5 6 7 8 9 10
>> $ b: num 0.692 0.325 0.634 0.16 0.873 ...
>>>
>>
>> Using the two functions write.csv and read.csv, I would assume, that the
>> resulting data.frame x2 be identical with x, but it has an additional
>> column X, which contains the row names of x.
>>
>> I know read.table and write.table which work as expected, but I would
>> like to use a csv for data exchange reasons.
>>
>> I know that I can use
>> write.csv(x, "x.csv", row.names=FALSE)
>>
>> and it would work, but shouldn't that be the default behaviour?
>
> I don't think so. The CSV format is an export format which holds less
> information than a dataframe. By exporting the dataframe to CSV and
> importing the result, you are discarding information and you should expect to
> get something different.
You need to read it with read.csv("x.csv", row.names=1)
Nothing in the csv format lets R know that the first column is the row
names (in the format used by read.table, having a header that is one
column short does). Now R could guess that a .csv file with an empty
string for the first column name is meant to be the row names, but
that would be merely a guess based on one (barely documented for
spreadsheets) convention.
> If you want to save a dataframe to disk and read it back unchanged, you
> should use save() and load().
Or one of the other serialization options such as serialize() and
.saveRDS(). R's own admin uses .saveRDS() for such purposes.
>
> Duncan Murdoch
>
>
>> And if this is not compliant with csv files, shouldn't the function
>> read.csv convert the first column into the row names?
>>
>> Cheers,
>>
>> Rainer
>>
>> - --
>> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
>> Biology, UCT), Dipl. Phys. (Germany)
>>
>> Centre of Excellence for Invasion Biology
>> Natural Sciences Building
>> Office Suite 2039
>> Stellenbosch University
>> Main Campus, Merriman Avenue
>> Stellenbosch
>> South Africa
>>
>> Tel: +33 - (0)9 53 10 27 44
>> Cell: +27 - (0)8 39 47 90 42
>> Fax (SA): +27 - (0)8 65 16 27 82
>> Fax (D) : +49 - (0)3 21 21 25 22 44
>> Fax (FR): +33 - (0)9 58 10 27 44
>> email: Rainer at krugs.de
>>
>> Skype: RMkrug
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.10 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>
>> iEYEARECAAYFAk0u4X8ACgkQoYgNqgF2egrLIgCeIqAevHGcOAK56qPcpNJ+vWav
>> iF0An2pk1RsY1GLJbvdMHG7FFpx437gB
>> =d5aG
>> -----END PGP SIGNATURE-----
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list