[R] poisson regression with robust error variance ('eyestudy
Achim.Zeileis at R-project.org
Achim.Zeileis at R-project.org
Thu May 8 23:31:00 CEST 2008
Paul & Ted:
> > I can get the estimated RRs from
>
> > RRs <- exp(summary(GLM)$coef[,1])
>
> > but do not see how to implement confidence intervals based
> > on "robust error variances" using the output in GLM.
>
>
> Thanks for the link to the data. Here's my best guess. If you use
> the following approach, with the HC0 type of robust standard errors in
> the "sandwich" package (thanks to Achim Zeileis), you get "almost" the
> same numbers as that Stata output gives. The estimated b's from the
> glm match exactly, but the robust standard errors are a bit off.
Thanks for posting the code reproducing this example!
> ### Paul Johnson 2008-05-08
> ### sandwichGLM.R
> system("wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta")
>
> library(foreign)
>
> dat <- read.dta("eyestudy.dta")
>
> ### Ach, stata codes factor contrasts backwards
> dat$carrot0 <- ifelse(dat$carrot==0,1,0)
> dat$gender1 <- ifelse(dat$gender==1,1,0)
>
> glm1 <- glm(lenses~carrot0, data=dat, family=poisson(link=log))
>
> library(sandwich)
> sqrt(diag(vcovHC(glm1, type="HC0")))
> # (Intercept) carrot0
> # 0.1673655 0.1971117
>
> ### Compare against
> ## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm
> ## robust standard errors are:
> ## .1682086 .1981048
I have the solution. Note that the ratio of both standard errors to those
from sandwich is almost constant which suggests a scaling difference. In
"sandwich" I have implemented two scaling strategies: divide by "n"
(number of observations) or by "n-k" (residual degrees of freedom). This
leads to
R> sqrt(diag(sandwich(glm1)))
(Intercept) carrot0
0.1673655 0.1971117
R> sqrt(diag(sandwich(glm1, adjust = TRUE)))
(Intercept) carrot0
0.1690647 0.1991129
(Equivalently, you could youse vcovHC() with types "HC0" and "HC1",
respectively.)
Their standard errors are between those, so I thought that they used a
different scaling. Here, n = 100 and n-k = 98 which only left
R> sqrt(diag(sandwich(glm1) * 100/99))
(Intercept) carrot0
0.1682086 0.1981048
Bingo! So they scaled with n-1 rather than n or n-k. This is, of course,
also consistent (if the other estimators are), but I wouldn't know of a
good theoretical underpinning for it.
> glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat,
> family=poisson(link=log))
>
> ### UCLA site has:
> ## .4929193 .1963616 .1857414 .0128142
R> round(sqrt(diag(sandwich(glm2) * 100/99)), digits = 7)
(Intercept) carrot0 gender1 latitude
0.4929204 0.1963617 0.1857417 0.0128142
Only the constant is a bit off, but that is not unusual in replication
exercises.
Some further advertising: If you want to get the full table of
coefficients, you can use coeftest() from "lmtest"
R> library("lmtest")
R> coeftest(glm2, sandwich(glm2) * 100/99)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.652122 0.492920 -1.3230 0.18584
carrot0 0.483220 0.196362 2.4609 0.01386 *
gender1 0.205201 0.185742 1.1048 0.26926
latitude -0.010009 0.012814 -0.7811 0.43474
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Best,
Z
More information about the R-help
mailing list