[Rd] RData File Specification?
Simon Urbanek
simon.urbanek at r-project.org
Sat Aug 25 04:01:12 CEST 2007
Ian,
On Aug 23, 2007, at 4:21 PM, Cook, Ian wrote:
> I am developing a tool for converting a large data frame stored in
> an uncompressed binary (XDR) RData file to a delimited text file.
> The data frame is too large to load() and extract rows from on a
> typical PC. I'm looking to parse through the file and extract
> individual entries without loading the whole thing into memory.
>
> In terms of some C source functions, instead of doing RestoreToEnv
> (R_Unserialize(connection)) which is essentially what load() does,
> I'm looking to get the documentation I would need to build a
> function "SaveToCSV()" so that I could do SaveToCSV(R_Unserialize
> (connection)).
>
> Where can I get documentation on the RData file format? Does a
> spec document exist?
>
I don't think so - basically the sources are all the documentation
I'm aware of. It's a bit messy, because R supports so many old formats.
However, if you want a stand-alone program that handles
(uncompressed) XDR2 only, then I may have saved you a bit of work. I
have a utility (based on the R sources) that allows you to scan
through XDR2 files and to extract individual objects into a separate
XDR2 file (this happens to be quite useful when you have a workspace
that doesn't load into R and yet you want to save some pieces of it).
Have a look at
http://urbanek.info/rdcopy.c
(you can either run it as "./rdcopy foo" to list the objects or "./
rdcopy foo -v" to show the full structure (all SEXPs with their
offsets) or "./rdcopy foo bar 19" to copy SEXP at offset 19 from foo
into a separate XDR2 file bar (use offset from the first call to copy
entire objects).
It's not prefect, but servers its purpose (it resolves references by
copying them instead of re-indexing, but it doesn't detect loops).
Maybe it helps, even though the task you describe is still far from
trivial.
Cheers,
Simon
More information about the R-devel
mailing list