[R] SURVEY PREDICTED SEs: Problem
mo2259 at columbia.edu
mo2259 at columbia.edu
Wed Jul 26 15:50:47 CEST 2006
Hello R-list,
I'm attempting to migrate from Stata to R for my complex survey
work. It has been straight-forward so far except for the
following problem:
I have some code below, but first I'll describe the problem.
When I compute predicted logits from a logistic regression, the
standard errors of the predicted logits are way off (but the
predicted logits are fine). Furthermore, the model logit
coefficients have appropriate SEs. As a comparison, I ran the same
model without the survey design; the predicted SEs come out fine.
Here is example code (first no survey design model and predictions;
then survey design model and predictions):
> #MODEL COEF. ESTIMATES (NO SURVEY DESIGN)
> model.l.nosvy <- glm(qn58~t8l,data=all.stratum,family=binomial)
> summary(model.l.nosvy)
Call:
glm(formula = qn58 ~ t8l, family = binomial, data = all.stratum)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.310 -1.245 1.050 1.111 1.158
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.175890 0.006176 28.48 <2e-16 ***
t8l -0.018643 0.001376 -13.55 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 145934 on 105857 degrees of freedom
Residual deviance: 145750 on 105856 degrees of freedom
AIC: 145754
Number of Fisher Scoring iterations: 3
>
> #PREDICTED SEs
> phat.l.se.logit.nosvy <- predict(model.l.nosvy,se=TRUE)
> as.matrix(table(phat.l.se.logit.nosvy$se.fit))
[,1]
0.00632408017609573 14456
0.00633130215261306 15188
0.00741988836010757 12896
0.00743834214717549 10392
0.00923404822144662 13207
0.00925875968615561 15864
0.0114294663004145 12235
0.0114574202170594 11620
>
> #MODEL COEF. ESTIMATES (SURVEY DESIGN)
> model.l <- svyglm(qn58~t8l,design=all.svy,family=binomial)
> summary(model.l)
Call:
svyglm(qn58 ~ t8l, design = all.svy, family = binomial)
Survey design:
svydesign(id = ~psu, strata = ~stratum, weights = ~weight, data =
all.stratum,
nest = T)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.016004 0.023267 -0.688 0.492
t8l -0.024496 0.004941 -4.958 1.13e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 0.934964)
Number of Fisher Scoring iterations: 2
>
> #PREDICTED SEs
> phat.l.logit.se <- predict(model.l,se=TRUE)
> as.matrix(table(phat.l.logit.se$se.fit))
[,1]
2.04867522818685 15188
2.05533753780321 14456
2.39885304369985 10392
2.41588959524594 12896
2.98273190185571 15864
3.00556161422958 13207
3.69102305734136 11620
3.71685978156846 12235
#THESE SEs are too large.
More information about the R-help
mailing list