[R] save(), load(), saveRDS(), and readRDS()

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Fri Sep 29 10:42:37 CEST 2023


On Thu, 28 Sep 2023 23:46:45 +0800
Shu Fai Cheung <shufai.cheung using gmail.com> wrote:

> In my personal work, I prefer using saveRDS() and loadRDS() as I
> don't like the risk of overwriting anything in the global
> environment.

There's the load(file, e <- new.env()) idiom, but that's potentially
a lot to type.

Confusingly, ?save also says:

>> For saving single R objects, ‘saveRDS()’ is mostly preferable to
>> ‘save()’, notably because of the _functional_ nature of ‘readRDS()’,
>> as opposed to ‘load()’.

> The files produced by save
> <http://127.0.0.1:18888/library/base/help/save> have a header
> identifying the file type and so are better protected against
> erroneous use."

This header is also mentioned elsewhere in ?saveRDS:

>> ‘save’ writes a single line header (typically ‘"RDXs\n"’)

The difference between the save() header and the serialize() header is
that the save() header is designed to be read independently from the
machine running the code: it's exactly 5 bytes; some precisely defined
combinations of those 5 bytes identify how the rest of the file should
be interpreted (nowadays, it's likely either "XDR format version 2" or
"XDR format version 3"), and the rest of them cause an error.

The serialize() header does contain enough information describing it
(there's the first byte choosing between ASCII/XDR/native binary and a
number of encoded integers describing the format version and the
version of R you need to parse it), but it's stored in terms of
serialized objects, so if you cannot for some reason decode them
properly, you won't be able to read the header. A little bit of
Catch-22.

> When will the problem mentioned in the warning occur? That is, when
> will a file saved by saveRDS() not be read correctly?

One example I can offer is when a dataset is saved using serialize(xdr
= FALSE) (which is not reachable using saveRDS()). The resulting file
format would be dependent on the native byte order of the CPU in your
computer. (Nowadays it's really hard to encounter a CPU that doesn't
use little-endian byte order, so this is doubly unlikely to happen in
practice.) Both save() and saveRDS() set xdr = TRUE and convert the
data to "network byte order" (big-endian) when saving and back - when
loading.

The warning is relatively fresh (May 2021). Perhaps Prof. Brian D.
Ripley (who made that change) will be able to explain it better.

-- 
Best regards,
Ivan



More information about the R-help mailing list