[Rd] Model object, when generated in a function, saves entire environment when saved

William Dunlap wdunlap at tibco.com
Wed Jul 27 20:19:44 CEST 2016


One way around this problem is to make a new environment whose
parent environment is .GlobalEnv and which contains only what the
the call to lm() requires and to compute lm() in that environment.   E.g.,

tfun1 <- function (subset)
{
    junk <- 1:1e+06
    env <- new.env(parent = globalenv())
    env$subset <- subset
    with(env, lm(Sepal.Length ~ Sepal.Width, data = iris, subset = subset))
}
Then we get
   > saveSize(tfun1(1:4)) # see below for def. of saveSize
   [1] 910
instead of the 2129743 bytes in the save file when using the naive method.

saveSize <- function (object) {
    tf <- tempfile(fileext = ".RData")
    on.exit(unlink(tf))
    save(object, file = tf)
    file.size(tf)
}



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jul 27, 2016 at 10:48 AM, Kenny Bell <kmb56 at berkeley.edu> wrote:

> In the below, I generate a model from an environment that isn't
> .GlobalEnv with a large object that is unrelated to the model
> generation. It seems to save the irrelevant object unnecessarily. In
> my actual use case, I am running and saving many models in a loop that
> each use a single large data.frame (that gets collapsed into a small
> data.frame for estimation), so removing it isn't an option.
>
> In the case where the model exists in .GlobalEnv, everything is
> peachy. So replicating whatever happens when saving the model that was
> generated in .GlobalEnv at the return() stage of the function call
> would fix this problem.
>
> I was referred to this list from r-bugs. First time r-devel poster.
>
> Hope this helps,
>
> Kendon
>
> ```
> tmp_fun <- function(x){
>   iris_big <- lapply(1:10000, function(x) iris)
>   lm(Sepal.Length ~ Sepal.Width, data = iris)
> }
>
> out <- tmp_fun(1)
> object.size(out)
> # 48008
> save(out, file = "tmp.RData", compress = FALSE)
> file.size("tmp.RData")
> # 57196752 - way too big
>
> # Works fine when in .GlobalEnv
> iris_big <- lapply(1:10000, function(x) iris)
> out <- lm(Sepal.Length ~ Sepal.Width, data = iris)
>
> object.size(out)
> # 48008
> save(out, file = "tmp.RData", compress = FALSE)
> file.size("tmp.RData")
> # 16641 - good size.
> ```
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list