[R] Weighted least squares
hadley wickham
h.wickham at gmail.com
Tue May 8 11:08:34 CEST 2007
Dear all,
I'm struggling with weighted least squares, where something that I had
assumed to be true appears not to be the case. Take the following
data set as an example:
df <- data.frame(x = runif(100, 0, 100))
df$y <- df$x + 1 + rnorm(100, sd=15)
I had expected that:
summary(lm(y ~ x, data=df, weights=rep(2, 100)))
summary(lm(y ~ x, data=rbind(df,df)))
would be equivalent, but they are not. I suspect the difference is
how the degrees of freedom is calculated - I had expected it to be
sum(weights), but seems to be sum(weights > 0). This seems
unintuitive to me:
summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
What am I missing? And what is the usual way to do a linear
regression when you have aggregated data?
Thanks,
Hadley
More information about the R-help
mailing list