[R] poisson regression with robust error variance ('eyestudy

Thu May 8 23:31:00 CEST 2008

Paul & Ted:

> > I can get the estimated RRs from
>
> > RRs <- exp(summary(GLM)$coef[,1])
>
> > but do not see how to implement confidence intervals based
> > on "robust error variances" using the output in GLM.
>
>
> Thanks for the link to the data.  Here's my best guess.   If you use
> the following approach, with the HC0 type of robust standard errors in
> the "sandwich" package (thanks to Achim Zeileis), you get "almost" the
> same numbers as that Stata output gives. The estimated b's from the
> glm match exactly, but the robust standard errors are a bit off.

Thanks for posting the code reproducing this example!

> ### Paul Johnson 2008-05-08
> ### sandwichGLM.R
> system("wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta")
>
> library(foreign)
>
> dat <- read.dta("eyestudy.dta")
>
> ### Ach, stata codes factor contrasts backwards
> dat$carrot0 <- ifelse(dat$carrot==0,1,0)
> dat$gender1 <- ifelse(dat$gender==1,1,0)
>
> glm1 <- glm(lenses~carrot0,  data=dat, family=poisson(link=log))
>
> library(sandwich)
> sqrt(diag(vcovHC(glm1, type="HC0")))
> # (Intercept)     carrot0
> #  0.1673655   0.1971117
>
> ### Compare against
> ## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm
> ## robust standard errors are:
> ## .1682086  .1981048

I have the solution. Note that the ratio of both standard errors to those
from sandwich is almost constant which suggests a scaling difference. In
"sandwich" I have implemented two scaling strategies: divide by "n"
(number of observations) or by "n-k" (residual degrees of freedom). This
leads to

R> sqrt(diag(sandwich(glm1)))
(Intercept)     carrot0
  0.1673655   0.1971117
R> sqrt(diag(sandwich(glm1, adjust = TRUE)))
(Intercept)     carrot0
  0.1690647   0.1991129

(Equivalently, you could youse vcovHC() with types "HC0" and "HC1",
respectively.)

Their standard errors are between those, so I thought that they used a
different scaling. Here, n = 100 and n-k = 98 which only left

R> sqrt(diag(sandwich(glm1) * 100/99))
(Intercept)     carrot0
  0.1682086   0.1981048

Bingo! So they scaled with n-1 rather than n or n-k. This is, of course,
also consistent (if the other estimators are), but I wouldn't know of a
good theoretical underpinning for it.

> glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat,
> family=poisson(link=log))
>
> ### UCLA site has:
> ## .4929193   .1963616  .1857414  .0128142

R> round(sqrt(diag(sandwich(glm2) * 100/99)), digits = 7)
(Intercept)     carrot0     gender1    latitude
  0.4929204   0.1963617   0.1857417   0.0128142

Only the constant is a bit off, but that is not unusual in replication
exercises.

Some further advertising: If you want to get the full table of
coefficients, you can use coeftest() from "lmtest"

R> library("lmtest")
R> coeftest(glm2, sandwich(glm2) * 100/99)

z test of coefficients:

             Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.652122   0.492920 -1.3230  0.18584
carrot0      0.483220   0.196362  2.4609  0.01386 *
gender1      0.205201   0.185742  1.1048  0.26926
latitude    -0.010009   0.012814 -0.7811  0.43474
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Best,
Z