[R] Object and file sizes

Göran Broström gor@n@bro@trom @end|ng |rom umu@@e
Fri Jun 28 16:12:34 CEST 2019



On 2019-06-28 15:26, Duncan Murdoch wrote:
> On 28/06/2019 7:35 a.m., Göran Broström wrote:
>> Hello,
>>
>> I have two large data frames, 'liss' (170 million obs, 8 variables) and
>> 'fobb' (52 million obs, 8 variables, same as for 'liss'), and checking
>> their sizes I get
>>
>>   > object.size(liss)
>> 7477492552 bytes
>>   > object.size(fobb)
>> 2494591736 bytes
>>
>> Fair enough, but when I save them to disk (saveRDS), the size relation
>> is reversed: 'fobb.rds' takes up 273 MB while 'liss.rds' uses 146 MB!
>>
>> I was puzzled by this and thought that I had made a mistake in creating
>> them, but the only explanation I can find for this is that 'liss'
>> contains a lot more missing values.
> 
> saveRDS() uses compression by default.  Compression works best if there 
> are a lot of repetitive values; every NA is the same, so that would help 
>   compression.  Other values may also be repeated.
> 
> If you use saveRDS(compress=FALSE), you'll get much larger results, 
> probably roughly proportional to the object.size() results.

Almost equal to the object.size results: The differences are 2167 bytes 
and 2171 bytes, respectively (smaller on disk). Thanks for the explanation!

Göran

> 
> Duncan Murdoch



More information about the R-help mailing list