[R] [Fwd: Re: evaluation question]
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Jan 26 00:36:06 CET 2009
It looks in data and if not found there in environment(formula)
so try this:
mylm <- function(model, wghts) {
lm(model, data.frame(wghts), weights = wghts)
}
On Sun, Jan 25, 2009 at 4:20 PM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
> dear list,
>
> below is an edited version of my response to an r user asking me for
> explaining some issues related to r's evaluation rules. i find the
> problem interesting enough to be forwarded to the list, hopefully for
> comments from whoever may want to extend or correct my explanations.
>
> (i'd like to add that much as i'm happy to receive and answer offline
> mails, questions related to r are best sent directly to the list, where
> the real experts are.)
>
>
> -------- Original Message --------
> Subject: Re: evaluation question
> Date: Sun, 25 Jan 2009 20:32:22 +0100
>
>
>
>
>
>
>
>
> xxx wrote:
>
> <snip>
>
>> Someone sent in an example a few days ago showing that prac1 ( see
>> below ) doesn't work. Then someone else sent two different
>> ways of fixing it.
>> I'm still slightly confused.
>
> <snip>
>
>
>>
>>
>> x<-1:10;
>> y<-rnorm(10) + x;
>>
>> # THIS DOES NOT WORK
>>
>> prac1 <- function( model,wghts){
>> lm( model, weights = wghts)
>> }
>>
>> prac1(model = y~x, wghts = rep(1, 10))
>
> tfm:
>
> " the variables are taken from 'environment(formula)', typically
> the environment from which 'lm' is called. "
>
> when lm is applied to a model, the variable names used to pass arguments
> to lm (here, 'wghts') are looked up in the environment where the model
> was defined. here, you have two environments:
>
> - the global one (say, e_g), where x, y, and prac1 are defined;
> - the call-local one (say, e_p1), created when prac1 is applied.
>
> there is a variable name 'wghts' in the latter, but none in the
> former. just before the call, environmentwise the situation is as follows:
>
> e_g = { 'x':v1, 'y':v2, 'prac1':v3 }
>
> where e_g contains three mappings (of those we are interested here), written here as <name>:<value>, none for
> 'wghts'. (the v1, v2, v3 stand for the respective values, as in the
> code above.)
>
> when you apply prac1, you create a new, local environment:
>
> e_p1 = { 'model':v4, 'wghts':v5 }
>
> where v4 is a promise with the expression 'y~x' and evaluation
> environment e_g (the caller's environment), and v5 is a promise with the
> expression 'rep(1, 10)' and evaluation environment e_g.
>
> when you call lm, things are a little bit more complicated. after some
> black magic is performed on the arguments in the lm call, weights are
> extracted from the model using model.weights, and the lookup is
> performed not in e_p1, but in e_g.
>
> rm(list=ls()) # cleanup
> x = 1:10
> y = rnorm(10)+x
>
> p1 = function(model, wghts)
> lm(model, weights=wghts)
>
> p1(y~x, rep(1,10))
> # (somewhat cryptic) error: no variable named 'wghts' found
>
> wghts = rep(1,10)
> p1(y~x, wghts)
> # now works, e_g has a binding for 'wghts'
> # passing wghts as an argument to p1 makes no difference
>
> note, due to lazy evaluation, the following won't do:
>
> rm(wghts) # cleanup
>
> p1(y~x, wghts<-rep(1,10))
> # wghts still not found in e_g
>
>
> if you happen to generalize your p1 over the additional arguments to be
> passed to lm, ugly surprizes await, too:
>
> p2 = function(model, ...) {
> # some additional code
> lm(model, ...) }
> p2(y~x, weights=rep(1,10))
> # (rather cryptic) error
>
>
> if you want to fit a model with different sets of weights, the following
> won't do:
>
> rm(wghts) # cleanup
> lapply(
> list(rep(1,10), rep(c(0.5, 1.5), 5)), # alternative weight vectors
> function(weights) p1(y~x, weights))
> # wghts not found in e_g, as before
>
> but this, incidentally, will work:
>
> rm(wghts) # cleanup
> lapply(
> list(rep(1,10), rep(c(0.5, 1.5), 5)),
> function(wghts) p1(y~x, wghts))
> # wghts found in e_g, not in e_p1
>
> as will this:
>
> rm(wghts) # cleanup
> lapply(
> list(rep(1,10), rep(c(0.5, 1.5), 5)),
> function(wghts) p1(y~x))
> # wghts found in e_g
>
> but obviously not this:
>
> rm(wghts) # cleanup
> lapply(
> list(rep(1,10), rep(c(0.5, 1.5), 5)),
> function(weights) p1(y~x))
> # wghts not found
>
>
>
>>
>> # SOLUTION # 1
>>
>> prac2 <- function( model,wghts){
>> environment(model) <- environment()
>> lm(model,weights = wghts)
>> }
>>
>> prac2(model = y~x, wghts = rep(1, 10))
>
> environment() returns the local call environment (see e_p1 above), where
> 'wghts' is mapped to a promise to evaluate rep(1,10) in e_g. you set
> the environment of model to e_p1, so that lm looks for wghts there --
> and finds it.
>
> this is an 'elegant' workaround, with possible disastrous
> consequences if the model happens to include a variable named 'model' or
> 'wghts':
>
> model = 1:10
> prac2(y~model, rep(1,10))
> # can't use model in a formula?
>
> wghts = x
> prac2(y~wghts, rep(1,10))
> # oops, not quite the same prac2(y~x, rep(1,10))
>
> another problem with this 'elegant' 'solution' is that if prac_ happens to have
> local variables with names in conflict with names in the model formula, you're
> in trouble again:
>
> prac2 = function(model, wghts) {
> environment(model) = environment()
> x = NULL # for whatever reason one might need an x here
> # whatever
> lm(model, weights = wghts) }
>
> prac2(y~x, rep(1,10))
> # oops, NULL is not good an x in the model
>
> these may be unlikely scenarios, but the issue is serious. you need to
> understand the details of how lm is implemented in order to understand
> why your (intuitively correct) example above did not work, to understand
> why and when your variables can be captured in unexpected ways, and to
> know how to write 'elegant' 'solutions' to avoid the problems.
>
>
> this also means that writing general purpose modules to be used by others is likely
> to cause errors (or silently produce rubbish) if you're not careful enough.
> (or else you need to prepare your code to handle all exotic situations such as
> passing weights to prac1 together with a model where one of the variables is
> named 'model' or 'wghts' -- this is point 2 in a recent comment by Greg Snow,
> which, as he says, stinks hubris.
>
>>
>> # SOLUTION # 2
>>
>> prac3 <- function( model,wghts){
>> cur.env <- environment()
>> lm( model, weights = wghts, data = cur.env )
>> }
>>
>> prac3(model = y~x, wghts = rep(1, 10))
>>
>>
>
> this is an equally 'good' 'solution', with the above comments equally
> applicable here. you'd have to tell the user of your modules which variable
> names are persona non grata.
>
> vQ
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list