[R] Object and file sizes
Göran Broström
gor@n@bro@trom @end|ng |rom umu@@e
Fri Jun 28 16:12:34 CEST 2019
On 2019-06-28 15:26, Duncan Murdoch wrote:
> On 28/06/2019 7:35 a.m., Göran Broström wrote:
>> Hello,
>>
>> I have two large data frames, 'liss' (170 million obs, 8 variables) and
>> 'fobb' (52 million obs, 8 variables, same as for 'liss'), and checking
>> their sizes I get
>>
>> > object.size(liss)
>> 7477492552 bytes
>> > object.size(fobb)
>> 2494591736 bytes
>>
>> Fair enough, but when I save them to disk (saveRDS), the size relation
>> is reversed: 'fobb.rds' takes up 273 MB while 'liss.rds' uses 146 MB!
>>
>> I was puzzled by this and thought that I had made a mistake in creating
>> them, but the only explanation I can find for this is that 'liss'
>> contains a lot more missing values.
>
> saveRDS() uses compression by default. Compression works best if there
> are a lot of repetitive values; every NA is the same, so that would help
> compression. Other values may also be repeated.
>
> If you use saveRDS(compress=FALSE), you'll get much larger results,
> probably roughly proportional to the object.size() results.
Almost equal to the object.size results: The differences are 2167 bytes
and 2171 bytes, respectively (smaller on disk). Thanks for the explanation!
Göran
>
> Duncan Murdoch
More information about the R-help
mailing list