[R] Discrepant lm() and survreg() standard errors with weighted fits
Therneau, Terry M., Ph.D.
therneau at mayo.edu
Tue Feb 25 17:50:27 CET 2014
On 02/25/2014 05:00 AM, r-help-request at r-project.org wrote:
> Hi,
>
> I have some measurements and their uncertainties. I'm using an
> uncensored subset of the data for a weighted fit (for now---I'll do a
> fit to the full, censored, dataset when I understand the results).
>
> survreg() reports a much smaller standard error for the model
> parameter than lm(), but only when I use weights. Am I missing
> something? Here is what I'm doing:
Survreg treats weights as case weights, and lm treats them as sampling weights.
Here is a simple example. Data set test2 has two copies of every obs in data set test.
> test <- data.frame(x=1:6, y=c(1,3,2,4,6,5))
> test2 <- test[c(1:6, 1:6),]
> summary(lm( y ~ x, data=test))$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4000000 0.9039595 0.4424977 0.68100354
x 0.8857143 0.2321154 3.8158362 0.01884548
> summary(lm( y~x, data=test2))$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4000000 0.5717142 0.6996503 0.500096805
x 0.8857143 0.1468027 6.0333668 0.000126369
As expected, the standard error has decreased by a factor of sqrt(2)
Now fit the model using case weights:
> summary(lm( y~x, data=test, weight= rep(2,6)))$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4000000 0.9039595 0.4424977 0.68100354
Notice that the answer matches the first run with data set test. Repeat this experiment
with survreg, and you will find that the weighted run matches data test2. When using the
robust variance, survreg treats weights as sampling weights, not case weights.
What is the "right" behavior? Neither or both: the writer of the routine simply makes a
choice and sticks with it. If you really care about this read up on the survey package
which cares about this type type of issue, in detail, and does it right. An intermediate
step is to use a software system (stata for example) that explicitly supports more than
one kind of weight.
More information about the R-help
mailing list