[R] Weighted least squares

Wed May 9 13:16:37 CEST 2007

Dear Hadley,

> -----Original Message-----
> From: hadley wickham [mailto:h.wickham at gmail.com] 
> Sent: Wednesday, May 09, 2007 2:21 AM
> To: John Fox
> Cc: R-help at stat.math.ethz.ch
> Subject: Re: [R] Weighted least squares
> 
> Thanks John,
> 
> That's just the explanation I was looking for. I had hoped 
> that there would be a built in way of dealing with them with 
> R, but obviously not.
> 
> Given that explanation, it stills seems to me that the way R 
> calculates n is suboptimal, as demonstrated by my second example:
> 
> summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) 
> summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
> 
> the weights are only very slightly different but the 
> estimates of residual standard error are quite different (20 
> vs 14 in my run)
> 

Observations with 0 weight are literally excluded, while those with very
small weight (relative to others) don't contribute much to the fit.
Consequently you get very similar coefficients but different numbers of
observations.

I hope this helps,
 John

> Hadley
> 
> On 5/8/07, John Fox <jfox at mcmaster.ca> wrote:
> > Dear Hadley,
> >
> > I think that the problem is that the term "weights" has different 
> > meanings, which, although they are related, are not quite the same.
> >
> > The weights used by lm() are (inverse-)"variance weights," 
> reflecting 
> > the variances of the errors, with observations that have 
> low-variance 
> > errors therefore being accorded greater weight in the 
> resulting WLS regression.
> > What you have are sometimes called "case weights," and I'm 
> unaware of 
> > a general way of handling them in R, although you could 
> regenerate the 
> > unaggregated data. As you discovered, you get the same coefficients 
> > with case weights as with variance weights, but different 
> standard errors.
> > Finally, there are "sampling weights," which are inversely 
> > proportional to the probability of selection; these are 
> accommodated by the survey package.
> >
> > To complicate matters, this terminology isn't entirely standard.
> >
> > I hope this helps,
> >  John
> >
> > --------------------------------
> > John Fox, Professor
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch 
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of hadley 
> > > wickham
> > > Sent: Tuesday, May 08, 2007 5:09 AM
> > > To: R Help
> > > Subject: [R] Weighted least squares
> > >
> > > Dear all,
> > >
> > > I'm struggling with weighted least squares, where 
> something that I 
> > > had assumed to be true appears not to be the case.
> > > Take the following data set as an example:
> > >
> > > df <- data.frame(x = runif(100, 0, 100)) df$y <- df$x + 1 + 
> > > rnorm(100, sd=15)
> > >
> > > I had expected that:
> > >
> > > summary(lm(y ~ x, data=df, weights=rep(2, 100))) 
> summary(lm(y ~ x, 
> > > data=rbind(df,df)))
> > >
> > > would be equivalent, but they are not.  I suspect the 
> difference is 
> > > how the degrees of freedom is calculated - I had expected 
> it to be 
> > > sum(weights), but seems to be sum(weights > 0).  This seems 
> > > unintuitive to me:
> > >
> > > summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) 
> > > summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
> > >
> > > What am I missing?  And what is the usual way to do a linear 
> > > regression when you have aggregated data?
> > >
> > > Thanks,
> > >
> > > Hadley
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list 
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
>