[R] behaviour of formula objects and environment inside functions

Thomas Alexander Gerds tag at biostat.ku.dk
Thu Mar 21 08:19:32 CET 2013


thanks for your answers and sorry that I didnt explain the
problem/question sufficiently in the first place. here it comes:

the problem is that when I create a formula inside a function and some
large objects exist there too, then saving the output of the formula
will save the in this case large environment:

test2a <- function(){
   large.object <- rnorm(1000000)
    out <- list(f=formula(u~b))
    out
}
v2a <- test2a()
save(v2a,file="~/tmp/v2.rda")

size of v2a.rda: 7.4M

saving the output of test() yields a file-size on disk of 7.4 Mega
bytes, even though the output of the function does not depend on the
large object. Given that the formula f is also completely independent of
the large.object the behaviour is surprising. It is even more suprising
that when the same code is evaluated outside the function in the
Globalenv then the saved object does not contain the large.object:

large.object <- rnorm(1000000)
v3 <- list(f=formula(u~b))
save(v3,file="~/tmp/v3.rda")

size of v3.rda: 128 B

In my set of functions I make sure that the formula is evaluated in an
existing data.frame. Hence, I want to solely use the look-up-variables
function and get rid of all the other functions of the formula.

Thanks Thomas


William Dunlap <wdunlap at tibco.com> writes:

> I didn't see where you said what your goal was in making the
> environment of a formula and empty environment.  I'm guessing that you
> want to make sure the variables in the formula come from the
> data.frame given to a fitting function along with the formula (so that
> typos cause errors for sure instead of sometimes giving an incorrect
> answer).
>
> Note that environment(formula) is used to look up not only the
> variables (and functions) in a formula, but also to look up some
> things used in a call to model.frame.  Hence setting the formula's
> environment to emptyenv() is not very useful - it limits things too
> much.
>
>   > form1 <- y ~ x1 + x2 environment(form1) <- emptyenv() dat <-
>   > data.frame(y=log(1:10), x1=1/(1:10), x2=sqrt(1:10)) fit <-
>   > lm(form1, data=dat)
>   Error in eval(expr, envir, enclos) : could not find function "list"
>   > traceback()
>   7: eval(expr, envir, enclos) 6: eval(predvars, data, env) 5:
> model.frame.default(formula = form1, data = dat, drop.unused.levels =
> TRUE) 4: model.frame(formula = form1, data = dat, drop.unused.levels =
> TRUE) 3: eval(expr, envir, enclos) 2: eval(mf, parent.frame()) 1:
> lm(form1, data = dat)
>
> I'm a bit surprised that this error happens - it might be avoided by
> rewriting some stuff in model.frame.  I can avoid it by doing
>   > e <- new.env(parent=emptyenv()) e$list <- base::list
>   > environment(form1) <- e fit <- lm(form1, data=dat)
> The fix may not be worthwhile because it won't help you with a formula
> like y~x1+sin(x2) - 'sin' will not be found.
>
> You could use environment(form1) <- parent.env(globalenv()) so all
> attached packages may be used but not globalenv().  Since packages
> tend to contain functions and not much data this may help if you are
> just trying to generate errors when there is a typo in the formula.
>
> Knowing why you want the environment of a formula to be empty would
> help answer your question.
>
> Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
>
>> -----Original Message----- From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Charles Berry
>> Sent: Wednesday, March 20, 2013 7:04 PM To: r-help at stat.math.ethz.ch
>> Subject: Re: [R] behaviour of formula objects and environment inside
>> functions
>> 
>> Thomas Alexander Gerds <tag <at> biostat.ku.dk> writes:
>> 
>> > Dear List
>> > I am looking for the recommended way to create a formula inside a
>> > function with an empty environment. I tried several versions (see
>> > below), and one of them seemed to work, but I dont understand why
>> > there is a difference between .GlobalEnv and the environment
>> > inside a function. I would be greatful for any reference or
>> > explanation or advice.
>> [snip]
>> 
>> From ?formula
>> 
>> Environments:
>> 
>>      A formula object has an associated environment, and this
>> environment (rather than the parent environment) is used by
>> model.frame' to evaluate variables that are not found in the
>> supplied 'data' argument.
>> 
>> So write four functions that:
>> 
>> 1) creates a formula 2) creates some data 3) evaluates a formula
>> using model.frame (even implicitly with lm(),say) 4) calls the
>> functions from 1, 2, and 3
>> 
>> When you run '4', the result will depend on the environment of data
>> from 2 and the environment of the formula from 1. If they are both
>> in the same environment, fine. If not, you might get lucky and have
>> the data in a place where it will be found nevertheless.
>> 
>> If you are really unlucky the '4' function will find some other data
>> that match the formula and use it.
>> 
>> HTH,
>> 
>> Chuck
>> 
>> ______________________________________________ R-help at r-project.org
>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

-- 
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen
University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark



More information about the R-help mailing list