[R] small object but huge RData file exported

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Thu Oct 21 08:36:54 CEST 2021


That depends what was in the active environment when you created that formula. You would probably benefit from reading https://adv-r.hadley.nz/environments.html about now, though you are about to enter a complex interaction between functions, formulas and environments.  A rational option is consider not saving this object to a file at all, but instead to extract what value you need from it now and save that.

On October 20, 2021 11:09:10 PM PDT, Jinsong Zhao <jszhao using yeah.net> wrote:
>This example has demoed the similar or same characteristics of my question.
>
>If I
> > save(formula, file = "abc.RData")
>and then in a new launched R session, I
> > load("abc.RData")
> > formula
>x ~ y
><environment: 0x00000000171e4be8>
>
>I want to know what are stored in the <environment: 0x00000000171e4be8>, 
>and how to access it, or how to save the object without the environment.
>
>Best,
>Jinsong
>
>On 2021/10/21 4:06, Henrik Bengtsson wrote:
>> Example illustrating what Duncan says:
>> 
>>> make_formula <- function() { large <- rnorm(1e6); x ~ y }
>>> formula <- make_formula()
>> 
>> # "Apparent" size of object
>>> object.size(formula)
>> 728 bytes
>> 
>> # Actual serialization size
>>> length(serialize(formula, connection = NULL))
>> [1] 8000203
>> 
>> # A better size estimate
>>> lobstr::obj_size(formula)
>> 8,000,888 B
>> 
>> /Henrik
>> 
>> On Wed, Oct 20, 2021 at 12:57 PM Duncan Murdoch
>> <murdoch.duncan using gmail.com> wrote:
>>>
>>> On 20/10/2021 9:20 a.m., Jinsong Zhao wrote:
>>>> On 2021/10/20 21:05, Duncan Murdoch wrote:
>>>>> On 20/10/2021 8:57 a.m., Jinsong Zhao wrote:
>>>>>> Hi there,
>>>>>>
>>>>>> I have a RData file that is obtained by save.image() with size about
>>>>>> 74.0 MB (77,608,222 bytes).
>>>>>>
>>>>>> When load into R, I measured the size of each object with object.size():
>>>>>>
>>>>>>> object.size(combn.rda.m)
>>>>>> 105448 bytes
>>>>>>> object.size(cross)
>>>>>> 102064 bytes
>>>>>>> object.size(denitr.1)
>>>>>> 25032 bytes
>>>>>>> object.size(rda.denitr.1)
>>>>>> 600280 bytes
>>>>>>> object.size(xh)
>>>>>> 7792 bytes
>>>>>>> object.size(xh.x)
>>>>>> 6064 bytes
>>>>>>> object.size(xh.x.1)
>>>>>> 24144 bytes
>>>>>>> object.size(xh.x.2)
>>>>>> 24144 bytes
>>>>>>> object.size(xh.x.3)
>>>>>> 24144 bytes
>>>>>>> object.size(xh.y)
>>>>>> 2384 bytes
>>>>>>
>>>>>> There are all small objects.
>>>>>>
>>>>>> If I delete the largest one "rda.denitr.1", and save.image("xx.RData").
>>>>>> It has the size of 22.6 KB (23,244 bytes). All seem OK.
>>>>>>
>>>>>> However, when I save(rda.denitr.1, file = "yy.RData"), then it has the
>>>>>> size of 73.9 MB (77,574,869 bytes).
>>>>>>
>>>>>> I don't know why...
>>>>>>
>>>>>> Any hint?
>>>>>
>>>>> As the docs for object.size() say, "Exactly which parts of the memory
>>>>> allocation should be attributed to which object is not clear-cut."  In
>>>>> particular, if a function or formula has an associated environment, it
>>>>> isn't included, but it is sometimes saved in the image.
>>>>>
>>>>> So I'd suspect rda.denitr.1 contains something that references an
>>>>> environment, and it's an environment that would be saved.  (I forget the
>>>>> exact rules, but I think that means it's not the global environment and
>>>>> it's not a package environment.)
>>>>>
>>>>> Duncan Murdoch
>>>>
>>>>
>>>> The rda.denitr.1 is only a list with length 2:
>>>> rda.denitr.1[[1]] is a vector with length 10;
>>>> rda.denitr.2[[2]] is a list with the length 10. rda.denitr.1[[2]][[1]]
>>>> to rda.denitr.1[[2]][[10]] are small RDA objects generated by rda() from
>>>> vegan package.
>>>>
>>>> If I
>>>>    > a <- rda.denitr.1[[2]][[1]]
>>>>    > object.size(a)
>>>> 59896 bytes
>>>>    > save(a, file = "abc.RData")
>>>> It also has a large size of 73.9 MB (77,536,611 bytes)
>>>>
>>>> Jinsong
>>>>
>>>
>>> The rda() function uses formulas.  If it saves the formula in the
>>> result, then it references the environment of that formula, typically
>>> the environment where the formula was created.
>>>
>>> Duncan Murdoch
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list