[R] Hand-crafting an .RData file

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Nov 9 09:05:38 CET 2009


The R 'save' format (as used for the saved workspace .RData) is 
described in the 'R Internals' manual (section 1.8).  It is intended 
for R objects, and you would first have to create one[*] of those in 
your other application.  That seems a lot of work.

The normal way to transfer numeric data between applications is to 
write a binary file: R can read such files with readBin(), and it also 
has wrappers/C-code to read a number of commmon binary data formats 
(e.g. those from SPSS).

With character data there are more issues (and more formats, see also 
readChar()), but load() is not particularly fast for those.

Ultimately the R functions pay a performance price for their 
flexibility so hand-crafted C code to read the format can be 
worthwhile: but see the comments below about whether I/O speed is 
that important.

[*] the 'save' format is a serialization of a single R object, even if 
you save many objects, since the object(s) are combined into a 
pairlist.

On Sun, 8 Nov 2009, Adam D. I. Kramer wrote:

> Hello,
>
> 	I frequently have to export a large quantity of data from some
> source (for example, a database, or a hand-written perl script) and then
> read it into R.  This occasionally takes a lot of time; I'm usually using
> read.table("filename",comment.char="",quote="") to read the data once it is
> written to disk.

Specifying colClasses and nrows will usually help.

To read from a database, packages such as RODBC use binary data 
transfer: with suitable tuning this can be fast.

> 	However, I *know* that the program that generates the data is more
> or less just calling printf in a for loop to create the csv or tab-delimited
> file, writing, then having R parse it, which is pretty inefficient. Instead, 
> I am interested in figuring out how to write the data in .RData
> format so that I can load() it instead of read.table() it.

Without more details it is hard to say if it is inefficient. 
read.table() can read data pretty fast (millions of items per second) 
if used following the hints in the 'R Data' manual.  See e.g.
https://stat.ethz.ch/pipermail/r-devel/2004-December/031733.html

Almost anything non-trivial one might do with such data is much 
slower.  The trend is to write richer (and slower to read) data 
formats.

> 	Trolling the internet, however, has not suggested anything about the
> specification for an .RData file. Could somebody link me to a specification
> or some information that would instruct me on how to construct a .RData
> file (either compressed or uncompressed)?
>
> 	Also, I am open to other suggestions of how to get load()-like
> efficiency in some other way.
>
> Many thanks,
> Adam D. I. Kramer

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list