[R-SIG-Mac] Efficient Data Formats

Dan Putler dan.putler at sauder.ubc.ca
Wed Dec 30 20:13:42 CET 2009


Hi Francis,

This isn't really the correct list for this question since it is a
general question about the use of R rather than one specifically related
to the use of R on OS X.

In terms of the answer to your question, the answer is it depends. If
you are working exclusively in R then saving your data to disk in a
*.RData file, via the save() function makes sense. If you are doing
spatial related things, and have GIS shapefile layers you will want to
work with at some point, or want to work with the data in other software
packages and need a file format that has some level of variable typing
(which isn't true in csv files) and is fairly "transportable", then
writing them into dbase format (via the write.dbf() function of the
foreign package) might also make sense.

The real reason to move things out of csv format isn't space (if space
is the only issue you can always store your data in a compressed
archive, such as a zip file, and uncompress files as needed), but
because csv (or any other text only format) lacks any variable typing,
one often finds that after reading a csv file into R, the resulting data
fields are not what they were expected to be.

Dan

On Wed, 2009-12-30 at 13:48 -0500, Francis Smart wrote:
> Hi,
> 
> I was wondering if someone could please direct me on how to choose the
> best format in order to save, open, and use data.  I am currently
> using datasets that are in csv format.  However, they are purely
> numeric and I think I should be to reduce their size substantially.
> 
> Thank you for your time,
-- 
Dan Putler
Sauder School of Business
University of British Columbia



More information about the R-SIG-Mac mailing list