[Rd] lm() takes weights from formula environment
John Mount
jmount @end|ng |rom w|n-vector@com
Sun Aug 9 21:01:21 CEST 2020
Doesn't this preclude "y ~ ." style notations?
> On Aug 9, 2020, at 11:56 AM, Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>
> This is fairly clearly documented in ?lm:
>
> "All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula."
>
> There are lots of possible places to look for weights, but this seems to me like a pretty sensible search order. In most cases the environment of the formula will have a parent environment chain that eventually leads to the global environment, so (with no conflicts) your strategy of defining w there will sometimes work, but looks pretty unreliable.
>
> When you say you want to work around this search order, I think the obvious way is to add your w vector to your d dataframe. That way it is guaranteed to be found even if there's a conflicting variable in the formula environment, or the global environment.
>
> Duncan Murdoch
>
> On 09/08/2020 2:13 p.m., John Mount wrote:
>> I know this programmers can reason this out from R's late parameter evaluation rules PLUS the explicit match.call()/eval() lm() does to work with the passed in formula and data frame. But, from a statistical user point of view this seems to be counter-productive. At best it works as if the user is passing in the name of the weights variable instead of values (I know this is the obvious consequence of NSE).
>> lm() takes instance weights from the formula environment. Usually that environment is the interactive environment or a close child of the interactive environment and we are lucky enough to have no intervening name collisions so we don't have a problem. However it makes programming over formulas for lm() a bit tricky. Here is an example of the issue.
>> Is there any recommended discussion on this and how to work around it? In my own work I explicitly set the formula environment and put the weights in that environment.
>> d <- data.frame(x = 1:3, y = c(3, 3, 4))
>> w <- c(1, 5, 1)
>> # works
>> lm(y ~ x, data = d, weights = w)
>> # fails, as weights are taken from formul environment
>> fn <- function() { # deliberately set up formula with bad value in environment
>> w <- c(-1, -1, -1, -1) # bad weights
>> f <- as.formula(y ~ x) # captures bad weights with as.formula(env = parent.frame()) default
>> return(f)
>> }
>> lm(fn(), data = d, weights = w)
>> # Error in model.frame.default(formula = fn(), data = d, weights = w, drop.unused.levels = TRUE) :
>> # variable lengths differ (found for '(weights)')
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list