[R] small object but huge RData file exported

Jinsong Zhao j@zh@o @end|ng |rom ye@h@net
Thu Oct 21 08:09:10 CEST 2021


This example has demoed the similar or same characteristics of my question.

If I
 > save(formula, file = "abc.RData")
and then in a new launched R session, I
 > load("abc.RData")
 > formula
x ~ y
<environment: 0x00000000171e4be8>

I want to know what are stored in the <environment: 0x00000000171e4be8>, 
and how to access it, or how to save the object without the environment.

Best,
Jinsong

On 2021/10/21 4:06, Henrik Bengtsson wrote:
> Example illustrating what Duncan says:
> 
>> make_formula <- function() { large <- rnorm(1e6); x ~ y }
>> formula <- make_formula()
> 
> # "Apparent" size of object
>> object.size(formula)
> 728 bytes
> 
> # Actual serialization size
>> length(serialize(formula, connection = NULL))
> [1] 8000203
> 
> # A better size estimate
>> lobstr::obj_size(formula)
> 8,000,888 B
> 
> /Henrik
> 
> On Wed, Oct 20, 2021 at 12:57 PM Duncan Murdoch
> <murdoch.duncan using gmail.com> wrote:
>>
>> On 20/10/2021 9:20 a.m., Jinsong Zhao wrote:
>>> On 2021/10/20 21:05, Duncan Murdoch wrote:
>>>> On 20/10/2021 8:57 a.m., Jinsong Zhao wrote:
>>>>> Hi there,
>>>>>
>>>>> I have a RData file that is obtained by save.image() with size about
>>>>> 74.0 MB (77,608,222 bytes).
>>>>>
>>>>> When load into R, I measured the size of each object with object.size():
>>>>>
>>>>>> object.size(combn.rda.m)
>>>>> 105448 bytes
>>>>>> object.size(cross)
>>>>> 102064 bytes
>>>>>> object.size(denitr.1)
>>>>> 25032 bytes
>>>>>> object.size(rda.denitr.1)
>>>>> 600280 bytes
>>>>>> object.size(xh)
>>>>> 7792 bytes
>>>>>> object.size(xh.x)
>>>>> 6064 bytes
>>>>>> object.size(xh.x.1)
>>>>> 24144 bytes
>>>>>> object.size(xh.x.2)
>>>>> 24144 bytes
>>>>>> object.size(xh.x.3)
>>>>> 24144 bytes
>>>>>> object.size(xh.y)
>>>>> 2384 bytes
>>>>>
>>>>> There are all small objects.
>>>>>
>>>>> If I delete the largest one "rda.denitr.1", and save.image("xx.RData").
>>>>> It has the size of 22.6 KB (23,244 bytes). All seem OK.
>>>>>
>>>>> However, when I save(rda.denitr.1, file = "yy.RData"), then it has the
>>>>> size of 73.9 MB (77,574,869 bytes).
>>>>>
>>>>> I don't know why...
>>>>>
>>>>> Any hint?
>>>>
>>>> As the docs for object.size() say, "Exactly which parts of the memory
>>>> allocation should be attributed to which object is not clear-cut."  In
>>>> particular, if a function or formula has an associated environment, it
>>>> isn't included, but it is sometimes saved in the image.
>>>>
>>>> So I'd suspect rda.denitr.1 contains something that references an
>>>> environment, and it's an environment that would be saved.  (I forget the
>>>> exact rules, but I think that means it's not the global environment and
>>>> it's not a package environment.)
>>>>
>>>> Duncan Murdoch
>>>
>>>
>>> The rda.denitr.1 is only a list with length 2:
>>> rda.denitr.1[[1]] is a vector with length 10;
>>> rda.denitr.2[[2]] is a list with the length 10. rda.denitr.1[[2]][[1]]
>>> to rda.denitr.1[[2]][[10]] are small RDA objects generated by rda() from
>>> vegan package.
>>>
>>> If I
>>>    > a <- rda.denitr.1[[2]][[1]]
>>>    > object.size(a)
>>> 59896 bytes
>>>    > save(a, file = "abc.RData")
>>> It also has a large size of 73.9 MB (77,536,611 bytes)
>>>
>>> Jinsong
>>>
>>
>> The rda() function uses formulas.  If it saves the formula in the
>> result, then it references the environment of that formula, typically
>> the environment where the formula was created.
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list