[R] Weighted least squares

hadley wickham h.wickham at gmail.com
Wed May 9 08:21:11 CEST 2007


Thanks John,

That's just the explanation I was looking for. I had hoped that there
would be a built in way of dealing with them with R, but obviously
not.

Given that explanation, it stills seems to me that the way R
calculates n is suboptimal, as demonstrated by my second example:

summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))

the weights are only very slightly different but the estimates of
residual standard error are quite different (20 vs 14 in my run)

Hadley

On 5/8/07, John Fox <jfox at mcmaster.ca> wrote:
> Dear Hadley,
>
> I think that the problem is that the term "weights" has different meanings,
> which, although they are related, are not quite the same.
>
> The weights used by lm() are (inverse-)"variance weights," reflecting the
> variances of the errors, with observations that have low-variance errors
> therefore being accorded greater weight in the resulting WLS regression.
> What you have are sometimes called "case weights," and I'm unaware of a
> general way of handling them in R, although you could regenerate the
> unaggregated data. As you discovered, you get the same coefficients with
> case weights as with variance weights, but different standard errors.
> Finally, there are "sampling weights," which are inversely proportional to
> the probability of selection; these are accommodated by the survey package.
>
> To complicate matters, this terminology isn't entirely standard.
>
> I hope this helps,
>  John
>
> --------------------------------
> John Fox, Professor
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> --------------------------------
>
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of hadley wickham
> > Sent: Tuesday, May 08, 2007 5:09 AM
> > To: R Help
> > Subject: [R] Weighted least squares
> >
> > Dear all,
> >
> > I'm struggling with weighted least squares, where something
> > that I had assumed to be true appears not to be the case.
> > Take the following data set as an example:
> >
> > df <- data.frame(x = runif(100, 0, 100)) df$y <- df$x + 1 +
> > rnorm(100, sd=15)
> >
> > I had expected that:
> >
> > summary(lm(y ~ x, data=df, weights=rep(2, 100))) summary(lm(y
> > ~ x, data=rbind(df,df)))
> >
> > would be equivalent, but they are not.  I suspect the
> > difference is how the degrees of freedom is calculated - I
> > had expected it to be sum(weights), but seems to be
> > sum(weights > 0).  This seems unintuitive to me:
> >
> > summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
> > summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
> >
> > What am I missing?  And what is the usual way to do a linear
> > regression when you have aggregated data?
> >
> > Thanks,
> >
> > Hadley
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>



More information about the R-help mailing list