[Rd] weights in lm, glm (PR#9023)

Thomas Lumley tlumley at u.washington.edu
Thu Jun 22 23:56:34 CEST 2006


On Thu, 22 Jun 2006, jsignoro at hsph.harvard.edu wrote:
>
> In the code below, fn1() and fn2() fail with the messages given in the comments.
> Strangely, fn2() fails for all data sets I've tried except for those with 100
> rows.
<snip>
> fn1 <- function(model, data)
> {
> 	w <- runif(nrow(data));
> 	print(lm(model, data=data, weights=w));
> }
>
> fn2 <- function(model, data)
> {
> 	print(lm(model, data=data, weights=runif(nrow(data))));
> }


This is the result of an interaction between a (IMO bad) design choice 
when lm and glm were first introduced in S-PLUS and a (IMO good) design 
choice more recently in R.

The bad design choice was that
   lm(model, data=data, weights=w)
is interpreted more like
   lm(model, data=data, weights=~w)

That is, as far as you can see from the outside, weights=w appears to be 
an ordinary argument passed by value but it is interpreted as if it were 
a reference by name to the data= argument.

This still wouldn't be too bad, except that if there is no element of 
data= called "w", lm() looks further. In S-PLUS it looks in the calling 
frame and then in the global workspace. In R it looks at the environment 
where the formula was defined.

Neither of these is necessarily what you expect, but people expect a wide 
range of incompatible things, so this isn't decisive.

There are at least two ways to get the result you want.  The simpler 
and cruder way is to make w a column of the data  frame. This is inefficient in memory if 
data is very large, and requires that you use a name that doesn't conflict 
with any variable that you already want in the model, eg.

   data$".weights."<-runif(nrow(data))
   lm(model, data=data,weights=.weights.)

The other approach is to set the environment of the formula to be the 
current environment. This will work as long as the formula doesn't refer 
to any variables in its original environment

    environment(model)<-environment()
    w<-runif(nrow(data))
    lm(model,data=data, weights=w)


> # But fn2() works if n=100

No, it just looks as though it does. I suspect you have a data frame 
called data, with 100 rows, in your workspace.

In a clean copy of R I get
> fn2(y ~ x, data=A);
Error in runif(n, min, max) : invalid arguments

 	-thomas


Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-devel mailing list