[Rd] Model object, when generated in a function, saves entire environment when saved
Kenny Bell
kmbell56 at gmail.com
Wed Jul 27 21:31:52 CEST 2016
Thanks so much for all this.
The first solution is what I'm going with as I want the terms object to
come along so that predict still works.
On Wed, Jul 27, 2016 at 12:28 PM, William Dunlap via R-devel <
r-devel at r-project.org> wrote:
> Another solution is to only save the parts of the model object that
> interest you. As long as they don't include the formula (which is
> what drags along the environment it was created in), you will
> save space. E.g.,
>
> tfun2 <- function(subset) {
> junk <- 1:1e6
> list(subset=subset, lm(Sepal.Length ~ Sepal.Width, data=iris,
> subset=subset)$coef)
> }
>
> saveSize(tfun2(1:4))
> #[1] 152
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Jul 27, 2016 at 11:19 AM, William Dunlap <wdunlap at tibco.com>
> wrote:
>
> > One way around this problem is to make a new environment whose
> > parent environment is .GlobalEnv and which contains only what the
> > the call to lm() requires and to compute lm() in that environment.
> E.g.,
> >
> > tfun1 <- function (subset)
> > {
> > junk <- 1:1e+06
> > env <- new.env(parent = globalenv())
> > env$subset <- subset
> > with(env, lm(Sepal.Length ~ Sepal.Width, data = iris, subset =
> subset))
> > }
> > Then we get
> > > saveSize(tfun1(1:4)) # see below for def. of saveSize
> > [1] 910
> > instead of the 2129743 bytes in the save file when using the naive
> method.
> >
> > saveSize <- function (object) {
> > tf <- tempfile(fileext = ".RData")
> > on.exit(unlink(tf))
> > save(object, file = tf)
> > file.size(tf)
> > }
> >
> >
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > On Wed, Jul 27, 2016 at 10:48 AM, Kenny Bell <kmb56 at berkeley.edu> wrote:
> >
> >> In the below, I generate a model from an environment that isn't
> >> .GlobalEnv with a large object that is unrelated to the model
> >> generation. It seems to save the irrelevant object unnecessarily. In
> >> my actual use case, I am running and saving many models in a loop that
> >> each use a single large data.frame (that gets collapsed into a small
> >> data.frame for estimation), so removing it isn't an option.
> >>
> >> In the case where the model exists in .GlobalEnv, everything is
> >> peachy. So replicating whatever happens when saving the model that was
> >> generated in .GlobalEnv at the return() stage of the function call
> >> would fix this problem.
> >>
> >> I was referred to this list from r-bugs. First time r-devel poster.
> >>
> >> Hope this helps,
> >>
> >> Kendon
> >>
> >> ```
> >> tmp_fun <- function(x){
> >> iris_big <- lapply(1:10000, function(x) iris)
> >> lm(Sepal.Length ~ Sepal.Width, data = iris)
> >> }
> >>
> >> out <- tmp_fun(1)
> >> object.size(out)
> >> # 48008
> >> save(out, file = "tmp.RData", compress = FALSE)
> >> file.size("tmp.RData")
> >> # 57196752 - way too big
> >>
> >> # Works fine when in .GlobalEnv
> >> iris_big <- lapply(1:10000, function(x) iris)
> >> out <- lm(Sepal.Length ~ Sepal.Width, data = iris)
> >>
> >> object.size(out)
> >> # 48008
> >> save(out, file = "tmp.RData", compress = FALSE)
> >> file.size("tmp.RData")
> >> # 16641 - good size.
> >> ```
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list