[Rd] how to control the environment of a formula

Thomas Alexander Gerds tag at biostat.ku.dk
Thu Apr 18 17:39:04 CEST 2013


Dear Duncan 

thank you for taking the time to answer my questions! It will be quite
some work to delete all the objects generated inside the function
... but if there is no other way to avoid a large environment then this
is what I will do.

Cheers
Thomas

Duncan Murdoch <murdoch.duncan at gmail.com> writes:

> On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote:
>> Dear List
>> I have experienced that objects generated with one of my packages
>> used a lot of space when saved on disc (object.size did not show
>> this!).
>> some debugging revealed that formula and call objects carried the
>> full environment of subroutines along, including even stuff not
>> needed by the formula or call. here is a sketch of the problem
>> ,----
>> | test <- function(x){ x <- rnorm(1000000) out <- list() out$f <-
>> | a~b out } v <- test(1) save(v,file="~/tmp/v.rda") system("ls -lah
>> | ~/tmp/v.rda")
>> | -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
>> `----
>> I tried to replace line 3 by
>> ,----
>> | as.formula(a~b,env=emptyenv()) or as.formula(a~b,env=NULL)
>> `----
>> without the desired effect. Instead adding either
>> ,----
>> | environment(out$f) <- emptyenv() or environment(out$f) <- NULL
>> `----
>> has the desired effect (i.e. the saved object size is
>> shrunken). unfortunately there is a new problem:
>> ,----
>> | test <- function(x){ x <- rnorm(1000000) out <- list() out$f <-
>> | a~b environment(out$f) <- emptyenv() out } d <-
>> | data.frame(a=1,b=1) v <- test(1) model.frame(v$f,data=d)
>> | Error in eval(expr, envir, enclos) : could not find function
>> | "list"
>> `----
>> Same with NULL in place of emptyenv()
>> Finally using .GlobalEnv in place of emptyenv() seems to remove both
>> problems.
>
> But it will cause other, less obvious problems.  In a formula, the
> symbols mean something.  By setting the environment to .GlobalEnv
> you're changing the meaning.  You'll get nonsense in certain cases
> when functions look up the meaning of those symbols and find the wrong
> thing. (I don't have an example at hand, but I imagine it would be
> easy to put one together with update().)
>
>> My questions:
>> 1) why does the argument env of as.formula have no effect?
>
> Because the first argument already had an associated environment.  You
> passed a ~ b, which is evaluated to a formula; calling as.formula on a
> formula does nothing. The env argument is only used when a new formula
> needs to be constructed.  (You can see this in the source code;
> as.formula is a very simple function.)
>
>> 2) is there a better way to tell formula not to copy unrelated stuff
>> into the associated environment?
>
> Yes, delete it.  For example, you could write your function as
>
>  test <- function(x){ x <- rnorm(1000000) out <- list() out$f <- a~b
> rm(x) out }
>
>> 3) why does object.size not show the size of the environments that
>> formulas can carry along?
>
> Because many objects can share the same environment.  See ?object.size
> for more details.
>
> Duncan Murdoch

-- 
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen
University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark



More information about the R-devel mailing list