[R] poisson regression with robust error variance ('eyestudy
(Ted Harding)
Ted.Harding at manchester.ac.uk
Thu May 8 23:28:39 CEST 2008
Once again, Paul, many thanks for your thorough examination
of this question! And for spelling out your approach!!!
It certainly looks as though you're very close to target
(or even spot-on).
I've only one comment -- see at end.
On 08-May-08 20:35:38, Paul Johnson wrote:
> Ted Harding said:
>> I can get the estimated RRs from
>
>> RRs <- exp(summary(GLM)$coef[,1])
>
>> but do not see how to implement confidence intervals based
>> on "robust error variances" using the output in GLM.
>
> Thanks for the link to the data. Here's my best guess. If you
> use the following approach, with the HC0 type of robust standard
> errors in the "sandwich" package (thanks to Achim Zeileis),
> you get "almost" the same numbers as that Stata output gives.
> The estimated b's from the glm match exactly, but the robust
> standard errors are a bit off.
>
>### Paul Johnson 2008-05-08
>### sandwichGLM.R
> system("wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta")
> library(foreign)
>
> dat <- read.dta("eyestudy.dta")
>### Ach, stata codes factor contrasts backwards
>
> dat$carrot0 <- ifelse(dat$carrot==0,1,0)
> dat$gender1 <- ifelse(dat$gender==1,1,0)
>
> glm1 <- glm(lenses~carrot0, data=dat, family=poisson(link=log))
> summary(glm1)
>
> library(sandwich)
> vcovHC(glm1)
> sqrt(diag(vcovHC(glm1)))
> sqrt(diag(vcovHC(glm1, type="HC0")))
>
>### Result:
># > summary(glm1)
># Call:
># glm(formula = lenses ~ carrot0, family = poisson(link = log),
># data = dat)
>
># Deviance Residuals:
># Min 1Q Median 3Q Max
># -1.1429 -0.9075 0.3979 0.3979 0.7734
>
># Coefficients:
># Estimate Std. Error z value Pr(>|z|)
># (Intercept) -0.8873 0.2182 -4.066 4.78e-05 ***
># carrot0 0.4612 0.2808 1.642 0.101
># ---
># Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
># (Dispersion parameter for poisson family taken to be 1)
>
># Null deviance: 67.297 on 99 degrees of freedom
># Residual deviance: 64.536 on 98 degrees of freedom
># AIC: 174.54
>
># Number of Fisher Scoring iterations: 5
>
># > sqrt(diag(vcovHC(glm1, type="HC0")))
># (Intercept) carrot0
># 0.1673655 0.1971117
>
>### Compare against
>## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm
>## robust standard errors are:
>## .1682086 .1981048
>
> glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat,
> family=poisson(link=log))
>
> sqrt(diag(vcovHC(glm2, type="HC0")))
>### Result:
># > summary(glm2)
>
>#Call:
># glm(formula = lenses ~ carrot0 + gender1 + latitude, family =
> poisson(link = log),
># data = dat)
>
># Deviance Residuals:
># Min 1Q Median 3Q Max
># -1.2332 -0.9316 0.2437 0.5470 0.9466
>
># Coefficients:
># Estimate Std. Error z value Pr(>|z|)
># (Intercept) -0.65212 0.69820 -0.934 0.3503
># carrot0 0.48322 0.28310 1.707 0.0878 .
># gender1 0.20520 0.27807 0.738 0.4605
># latitude -0.01001 0.01898 -0.527 0.5980
># ---
># Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
># (Dispersion parameter for poisson family taken to be 1)
>
># Null deviance: 67.297 on 99 degrees of freedom
># Residual deviance: 63.762 on 96 degrees of freedom
># AIC: 177.76
>
># Number of Fisher Scoring iterations: 5
>
># > sqrt(diag(vcovHC(glm2, type="HC0")))
># (Intercept) carrot0 gender1 latitude
># 0.49044963 0.19537743 0.18481067 0.01275001
>
>### UCLA site has:
>
>## .4929193 .1963616 .1857414 .0128142
>
>
> So, either there is some "rounding error" or Stata is not using
> exactly the same algorithm as vcovHC in sandwich.
The above differences look somewhat systematic (though very
small). The percentage differences (vcovHC relative to STATA)
for the two cases you analyse above are
vcovHC "HC0": 0.1673655 0.1971117
STATA : 0.1682086 0.1981048
-------------------------------------
% Difference: -0.5012229 -0.5013003
vcovHC "HC0": 0.49044963 0.19537743 0.18481067 0.01275001
STATA: 0.4929193 0.1963616 0.1857414 .0128142
-------------------------------------------------------------
% Difference: -0.5010293 -0.5012029 -0.5010891 -0.5009287
so, in all cases, very close to -0.501%, despite varying absolute
values. This suggests a very subtle difference in algorithm,
rather than rounding error.
Though there is conceivably a convergence issue in the background.
Does ANYONE here know exactly how STATA does it?
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-May-08 Time: 22:28:36
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list