[R] Object and file sizes

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Fri Jun 28 15:26:47 CEST 2019


On 28/06/2019 7:35 a.m., Göran Broström wrote:
> Hello,
> 
> I have two large data frames, 'liss' (170 million obs, 8 variables) and
> 'fobb' (52 million obs, 8 variables, same as for 'liss'), and checking
> their sizes I get
> 
>   > object.size(liss)
> 7477492552 bytes
>   > object.size(fobb)
> 2494591736 bytes
> 
> Fair enough, but when I save them to disk (saveRDS), the size relation
> is reversed: 'fobb.rds' takes up 273 MB while 'liss.rds' uses 146 MB!
> 
> I was puzzled by this and thought that I had made a mistake in creating
> them, but the only explanation I can find for this is that 'liss'
> contains a lot more missing values.

saveRDS() uses compression by default.  Compression works best if there 
are a lot of repetitive values; every NA is the same, so that would help 
  compression.  Other values may also be repeated.

If you use saveRDS(compress=FALSE), you'll get much larger results, 
probably roughly proportional to the object.size() results.

Duncan Murdoch



More information about the R-help mailing list