[Rd] weights in lm, glm (PR#9023)
Thomas Lumley
tlumley at u.washington.edu
Thu Jun 22 23:56:34 CEST 2006
On Thu, 22 Jun 2006, jsignoro at hsph.harvard.edu wrote:
>
> In the code below, fn1() and fn2() fail with the messages given in the comments.
> Strangely, fn2() fails for all data sets I've tried except for those with 100
> rows.
<snip>
> fn1 <- function(model, data)
> {
> w <- runif(nrow(data));
> print(lm(model, data=data, weights=w));
> }
>
> fn2 <- function(model, data)
> {
> print(lm(model, data=data, weights=runif(nrow(data))));
> }
This is the result of an interaction between a (IMO bad) design choice
when lm and glm were first introduced in S-PLUS and a (IMO good) design
choice more recently in R.
The bad design choice was that
lm(model, data=data, weights=w)
is interpreted more like
lm(model, data=data, weights=~w)
That is, as far as you can see from the outside, weights=w appears to be
an ordinary argument passed by value but it is interpreted as if it were
a reference by name to the data= argument.
This still wouldn't be too bad, except that if there is no element of
data= called "w", lm() looks further. In S-PLUS it looks in the calling
frame and then in the global workspace. In R it looks at the environment
where the formula was defined.
Neither of these is necessarily what you expect, but people expect a wide
range of incompatible things, so this isn't decisive.
There are at least two ways to get the result you want. The simpler
and cruder way is to make w a column of the data frame. This is inefficient in memory if
data is very large, and requires that you use a name that doesn't conflict
with any variable that you already want in the model, eg.
data$".weights."<-runif(nrow(data))
lm(model, data=data,weights=.weights.)
The other approach is to set the environment of the formula to be the
current environment. This will work as long as the formula doesn't refer
to any variables in its original environment
environment(model)<-environment()
w<-runif(nrow(data))
lm(model,data=data, weights=w)
> # But fn2() works if n=100
No, it just looks as though it does. I suspect you have a data frame
called data, with 100 rows, in your workspace.
In a clean copy of R I get
> fn2(y ~ x, data=A);
Error in runif(n, min, max) : invalid arguments
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-devel
mailing list