[Rd] how to control the environment of a formula
Duncan Murdoch
murdoch.duncan at gmail.com
Thu Apr 18 14:35:11 CEST 2013
On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote:
> Dear List
>
> I have experienced that objects generated with one of my packages used
> a lot of space when saved on disc (object.size did not show this!).
>
> some debugging revealed that formula and call objects carried the full
> environment of subroutines along, including even stuff not needed by the
> formula or call. here is a sketch of the problem
>
> ,----
> | test <- function(x){
> | x <- rnorm(1000000)
> | out <- list()
> | out$f <- a~b
> | out
> | }
> | v <- test(1)
> | save(v,file="~/tmp/v.rda")
> | system("ls -lah ~/tmp/v.rda")
> |
> | -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
> `----
>
> I tried to replace line 3 by
>
> ,----
> | as.formula(a~b,env=emptyenv())
> | or
> | as.formula(a~b,env=NULL)
> `----
>
> without the desired effect. Instead adding either
>
> ,----
> | environment(out$f) <- emptyenv()
> | or
> | environment(out$f) <- NULL
> `----
>
> has the desired effect (i.e. the saved object size is
> shrunken). unfortunately there is a new problem:
>
> ,----
> | test <- function(x){
> | x <- rnorm(1000000)
> | out <- list()
> | out$f <- a~b
> | environment(out$f) <- emptyenv()
> | out
> | }
> | d <- data.frame(a=1,b=1)
> | v <- test(1)
> | model.frame(v$f,data=d)
> |
> | Error in eval(expr, envir, enclos) : could not find function "list"
> `----
>
> Same with NULL in place of emptyenv()
>
> Finally using .GlobalEnv in place of emptyenv() seems to remove both problems.
But it will cause other, less obvious problems. In a formula, the
symbols mean something. By setting the environment to .GlobalEnv you're
changing the meaning. You'll get nonsense in certain cases when
functions look up the meaning of those symbols and find the wrong thing.
(I don't have an example at hand, but I imagine it would be easy to
put one together with update().)
> My questions:
>
> 1) why does the argument env of as.formula have no effect?
Because the first argument already had an associated environment. You
passed a ~ b, which is evaluated to a formula; calling as.formula on a
formula does nothing. The env argument is only used when a new formula
needs to be constructed. (You can see this in the source code;
as.formula is a very simple function.)
> 2) is there a better way to tell formula not to copy unrelated stuff
> into the associated environment?
Yes, delete it. For example, you could write your function as
test <- function(x){
x <- rnorm(1000000)
out <- list()
out$f <- a~b
rm(x)
out
}
> 3) why does object.size not show the size of the environments that
> formulas can carry along?
Because many objects can share the same environment. See ?object.size
for more details.
Duncan Murdoch
More information about the R-devel
mailing list