[Rd] Model object, when generated in a function, saves entire environment when saved

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Wed Jan 29 21:24:22 CET 2020


On 29/01/2020 2:25 p.m., Kenny Bell wrote:
> Reviving an old thread. I haven't noticed this be a problem for a while
> when saving RDS's which is great. However, I noticed the problem again when
> saving `qs` files (https://github.com/traversc/qs) which is an RDS
> replacement with a fast serialization / compression system.
> 
> I'd like to get an idea of what change was made within R to address this
> issue for `saveRDS`. My thought is that this will help the author of the
> `qs` package do something similar. I have had a browse through the release
> notes for the last few years (Ctrl-F-ing "environment") and couldn't see it.

The vector 1:1e+08 is stored very compactly in recent R versions (the 
start and end plus a marker that it's a sequence), and it appears 
saveRDS takes advantage of that while qs::qsave doesn't.  That's not a 
very useful test, because environments typically aren't filled with long 
sequence vectors.  If you replace the line

   junk <- 1:1e+08

with

   junk <- runif(1e+08)

you'll see drastically different results:

 > save_size_qs(normal_lm())
[1] 417953609
 > #> [1] 848396
 > save_size_rds(normal_lm())
[1] 532614827
 > #> [1] 4163
 > save_size_qs(normal_ggplot())
[1] 417967987

 > #> [1] 857446
 > save_size_rds(normal_ggplot())
[1] 532624477
 > #> [1] 12895

Duncan Murdoch



More information about the R-devel mailing list