[R] Weighted least squares
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Tue May 8 14:16:50 CEST 2007
See below.
hadley wickham wrote:
> Dear all,
>
> I'm struggling with weighted least squares, where something that I had
> assumed to be true appears not to be the case. Take the following
> data set as an example:
>
> df <- data.frame(x = runif(100, 0, 100))
> df$y <- df$x + 1 + rnorm(100, sd=15)
>
> I had expected that:
>
> summary(lm(y ~ x, data=df, weights=rep(2, 100)))
> summary(lm(y ~ x, data=rbind(df,df)))
You assign weights to different points according to some external
quality or reliability measure not number of times the data point was
measured.
Look at the estimates and standard error of the two models below:
coefficients( summary(f.w <- lm(y ~ x, data=df, weights=rep(2, 100))) )
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.940765 3.30348066 0.587491 5.582252e-01
x 0.982610 0.05893262 16.673448 2.264258e-30
coefficients( summary( f.u <- lm(y ~ x, data=rbind(df,df) ) ) )
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.940765 2.32408609 0.8350659 4.046871e-01
x 0.982610 0.04146066 23.6998165 1.012067e-59
You can see that they have same coefficient estimates but the second one
has smaller variances.
The repeated values artificially deflates the variance and thus inflates
the precision. This is why you cannot treat replicate data as
independent observations.
> would be equivalent, but they are not. I suspect the difference is
> how the degrees of freedom is calculated - I had expected it to be
> sum(weights), but seems to be sum(weights > 0). This seems
> unintuitive to me:
>
> summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
> summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
>
> What am I missing? And what is the usual way to do a linear
> regression when you have aggregated data?
I would be best to use the individual data points instead of aggregated
data as it allows you to estimate the within-group variations as well.
If you had individual data points, you could try something as follows.
Please check the codes as I am no expert in the area of repeated measures.
x <- runif(100, 0, 100)
y1 <- x + rnorm(100, mean=1, sd=15)
y2 <- y1 + rnorm(100, sd=5)
df <- data.frame( y=c(y1, y2),
x=c(x,x),
subject=factor(rep( paste("p", 1:100, sep=""), 2 ) ))
library(nlme)
summary( lme( y ~ x, random = ~ 1 | subject, data=df ) )
Try reading Pinheiro and Bates (http://tinyurl.com/yvvrr7) or related
material for more information. Hope this helps.
> Thanks,
>
> Hadley
Regards, Adai
More information about the R-help
mailing list