[R] Discrepant lm() and survreg() standard errors with weighted fits
Kyle Penner
kpenner at as.arizona.edu
Tue Feb 25 21:29:15 CET 2014
> Survreg treats weights as case weights, and lm treats them as sampling weights.
> Here is a simple example. Data set test2 has two copies of every obs in data set test.
>
> > test <- data.frame(x=1:6, y=c(1,3,2,4,6,5))
> > test2 <- test[c(1:6, 1:6),]
>
> > summary(lm( y ~ x, data=test))$coef
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.4000000 0.9039595 0.4424977 0.68100354
> x 0.8857143 0.2321154 3.8158362 0.01884548
>
> > summary(lm( y~x, data=test2))$coef
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.4000000 0.5717142 0.6996503 0.500096805
> x 0.8857143 0.1468027 6.0333668 0.000126369
>
> As expected, the standard error has decreased by a factor of sqrt(2)
> Now fit the model using case weights:
>
> > summary(lm( y~x, data=test, weight= rep(2,6)))$coef
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.4000000 0.9039595 0.4424977 0.68100354
>
> Notice that the answer matches the first run with data set test. Repeat this experiment
> with survreg, and you will find that the weighted run matches data test2. When using the
> robust variance, survreg treats weights as sampling weights, not case weights.
When I use robust=F, I now understand the results:
> summary(survreg(Surv(y)~x, dist='gaussian', data=test, weights=rep(2,6)))$table
Value Std. Error z p
(Intercept) 0.4000000 0.5219013 0.7664285 4.434214e-01
x 0.8857143 0.1340119 6.6092222 3.863445e-11
Log(scale) -0.2321528 0.2041241 -1.1373118 2.554080e-01
When I use robust=T, I do not understand how survreg treats the
weights as sampling weights and arrives at a different standard error
from lm:
> summary(survreg(Surv(y)~x, dist='gaussian', data=test, weights=rep(2,6), robust=T))$table
Value Std. Err (Naive SE) z p
(Intercept) 0.4000000 0.29426260 0.5219013 1.35933 1.740420e-01
x 0.8857143 0.08384353 0.1340119 10.56390 4.380958e-26
Log(scale) -0.2321528 0.08117684 0.2041241 -2.85984 4.238543e-03
More information about the R-help
mailing list