[R] SURVEY PREDICTED SEs: Problem

Wed Jul 26 15:50:47 CEST 2006

Hello R-list,
I'm attempting to migrate from Stata to R for my complex survey
work.    It has been straight-forward so far except for the
following problem:

I have some code below, but first I'll describe the problem.

When I compute predicted logits from a logistic regression, the
standard errors of the predicted logits are way off (but the
predicted logits are fine).  Furthermore, the model logit 
coefficients have appropriate SEs. As a comparison, I ran the same
model without the survey design; the predicted SEs come out fine.

Here is example code (first no survey design model and predictions;
then survey design model and predictions):

> #MODEL COEF. ESTIMATES (NO SURVEY DESIGN)
> model.l.nosvy <- glm(qn58~t8l,data=all.stratum,family=binomial)
> summary(model.l.nosvy)

Call:
glm(formula = qn58 ~ t8l, family = binomial, data = all.stratum)

Deviance Residuals:
   Min      1Q  Median      3Q     Max
-1.310  -1.245   1.050   1.111   1.158

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.175890   0.006176   28.48   <2e-16 ***
t8l         -0.018643   0.001376  -13.55   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 145934  on 105857  degrees of freedom
Residual deviance: 145750  on 105856  degrees of freedom
AIC: 145754

Number of Fisher Scoring iterations: 3

>
> #PREDICTED SEs
> phat.l.se.logit.nosvy <- predict(model.l.nosvy,se=TRUE)
> as.matrix(table(phat.l.se.logit.nosvy$se.fit))
                     [,1]
0.00632408017609573 14456
0.00633130215261306 15188
0.00741988836010757 12896
0.00743834214717549 10392
0.00923404822144662 13207
0.00925875968615561 15864
0.0114294663004145  12235
0.0114574202170594  11620
>
> #MODEL COEF. ESTIMATES (SURVEY DESIGN)
> model.l <- svyglm(qn58~t8l,design=all.svy,family=binomial)
> summary(model.l)

Call:
svyglm(qn58 ~ t8l, design = all.svy, family = binomial)

Survey design:
svydesign(id = ~psu, strata = ~stratum, weights = ~weight, data =
all.stratum,
    nest = T)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.016004   0.023267  -0.688    0.492
t8l         -0.024496   0.004941  -4.958 1.13e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 0.934964)

Number of Fisher Scoring iterations: 2

>
> #PREDICTED SEs
> phat.l.logit.se <- predict(model.l,se=TRUE)
> as.matrix(table(phat.l.logit.se$se.fit))
                  [,1]
2.04867522818685 15188
2.05533753780321 14456
2.39885304369985 10392
2.41588959524594 12896
2.98273190185571 15864
3.00556161422958 13207
3.69102305734136 11620
3.71685978156846 12235

#THESE SEs are too large.