[R] small object but huge RData file exported
Jinsong Zhao
j@zh@o @end|ng |rom ye@h@net
Thu Oct 21 10:51:25 CEST 2021
Thanks a lot for your kindly reply and explanation. Environment is
something hard for me. I will follow the advice from Duncan and Jeff.
Best,
Jinsong
On 2021/10/21 14:36, Jeff Newmiller wrote:
> That depends what was in the active environment when you created that formula. You would probably benefit from reading https://adv-r.hadley.nz/environments.html about now, though you are about to enter a complex interaction between functions, formulas and environments. A rational option is consider not saving this object to a file at all, but instead to extract what value you need from it now and save that.
>
> On October 20, 2021 11:09:10 PM PDT, Jinsong Zhao <jszhao using yeah.net> wrote:
>> This example has demoed the similar or same characteristics of my question.
>>
>> If I
>>> save(formula, file = "abc.RData")
>> and then in a new launched R session, I
>>> load("abc.RData")
>>> formula
>> x ~ y
>> <environment: 0x00000000171e4be8>
>>
>> I want to know what are stored in the <environment: 0x00000000171e4be8>,
>> and how to access it, or how to save the object without the environment.
>>
>> Best,
>> Jinsong
>>
>> On 2021/10/21 4:06, Henrik Bengtsson wrote:
>>> Example illustrating what Duncan says:
>>>
>>>> make_formula <- function() { large <- rnorm(1e6); x ~ y }
>>>> formula <- make_formula()
>>>
>>> # "Apparent" size of object
>>>> object.size(formula)
>>> 728 bytes
>>>
>>> # Actual serialization size
>>>> length(serialize(formula, connection = NULL))
>>> [1] 8000203
>>>
>>> # A better size estimate
>>>> lobstr::obj_size(formula)
>>> 8,000,888 B
>>>
>>> /Henrik
>>>
>>> On Wed, Oct 20, 2021 at 12:57 PM Duncan Murdoch
>>> <murdoch.duncan using gmail.com> wrote:
>>>>
>>>> On 20/10/2021 9:20 a.m., Jinsong Zhao wrote:
>>>>> On 2021/10/20 21:05, Duncan Murdoch wrote:
>>>>>> On 20/10/2021 8:57 a.m., Jinsong Zhao wrote:
>>>>>>> Hi there,
>>>>>>>
>>>>>>> I have a RData file that is obtained by save.image() with size about
>>>>>>> 74.0 MB (77,608,222 bytes).
>>>>>>>
>>>>>>> When load into R, I measured the size of each object with object.size():
>>>>>>>
>>>>>>>> object.size(combn.rda.m)
>>>>>>> 105448 bytes
>>>>>>>> object.size(cross)
>>>>>>> 102064 bytes
>>>>>>>> object.size(denitr.1)
>>>>>>> 25032 bytes
>>>>>>>> object.size(rda.denitr.1)
>>>>>>> 600280 bytes
>>>>>>>> object.size(xh)
>>>>>>> 7792 bytes
>>>>>>>> object.size(xh.x)
>>>>>>> 6064 bytes
>>>>>>>> object.size(xh.x.1)
>>>>>>> 24144 bytes
>>>>>>>> object.size(xh.x.2)
>>>>>>> 24144 bytes
>>>>>>>> object.size(xh.x.3)
>>>>>>> 24144 bytes
>>>>>>>> object.size(xh.y)
>>>>>>> 2384 bytes
>>>>>>>
>>>>>>> There are all small objects.
>>>>>>>
>>>>>>> If I delete the largest one "rda.denitr.1", and save.image("xx.RData").
>>>>>>> It has the size of 22.6 KB (23,244 bytes). All seem OK.
>>>>>>>
>>>>>>> However, when I save(rda.denitr.1, file = "yy.RData"), then it has the
>>>>>>> size of 73.9 MB (77,574,869 bytes).
>>>>>>>
>>>>>>> I don't know why...
>>>>>>>
>>>>>>> Any hint?
>>>>>>
>>>>>> As the docs for object.size() say, "Exactly which parts of the memory
>>>>>> allocation should be attributed to which object is not clear-cut." In
>>>>>> particular, if a function or formula has an associated environment, it
>>>>>> isn't included, but it is sometimes saved in the image.
>>>>>>
>>>>>> So I'd suspect rda.denitr.1 contains something that references an
>>>>>> environment, and it's an environment that would be saved. (I forget the
>>>>>> exact rules, but I think that means it's not the global environment and
>>>>>> it's not a package environment.)
>>>>>>
>>>>>> Duncan Murdoch
>>>>>
>>>>>
>>>>> The rda.denitr.1 is only a list with length 2:
>>>>> rda.denitr.1[[1]] is a vector with length 10;
>>>>> rda.denitr.2[[2]] is a list with the length 10. rda.denitr.1[[2]][[1]]
>>>>> to rda.denitr.1[[2]][[10]] are small RDA objects generated by rda() from
>>>>> vegan package.
>>>>>
>>>>> If I
>>>>> > a <- rda.denitr.1[[2]][[1]]
>>>>> > object.size(a)
>>>>> 59896 bytes
>>>>> > save(a, file = "abc.RData")
>>>>> It also has a large size of 73.9 MB (77,536,611 bytes)
>>>>>
>>>>> Jinsong
>>>>>
>>>>
>>>> The rda() function uses formulas. If it saves the formula in the
>>>> result, then it references the environment of that formula, typically
>>>> the environment where the formula was created.
>>>>
>>>> Duncan Murdoch
More information about the R-help
mailing list