[R-sig-ME] Huber-White standard errors for glmer Poisson fits
Theodore Lytras
thlytras at gmail.com
Mon Feb 17 11:02:19 CET 2014
Dear list,
I am a PhD student in epidemiology, trying to analyze a multi-site cohort
study with R. I have a series of mixed-effects Poisson model objects fitted
with glmer(), and I am trying to find a way to calculate Huber-White robust
standard errors for these.
The only program I found that calculates Huber-White SEs for mixed-model fits
is GLLAMM in Stata, so I would like to reproduce this functionality in R.
I took as a starting point the package "sandwich" by Achim Zeileis, which
unfortunately provides a sandwich covariance matrix only for glm objects. So I
tried to modify his code (functions estfun.glm(), bread.glm() and sandwich())
and the function robust.se() found in [1] to make it work with glmer objects,
simplifying things as much as possible (focusing only on Poisson fits).
Thus I came up with the code in [2]. Function robustSEglmm() takes a glmer fit
with family="Poisson" and the associated clustering variable and tries to
compute Huber-White standard errors.
What I found was the following: if estfunGlmm() uses the "working" residuals
(provided by residuals.merMod()), as is the case in the original estfun.glm()
and explained in the "sandwich" package documentation, the resulting SEs are
far too large, 6-10 times as those produced by GLLAMM.
However, after reading a few things about the various types of residuals in
[3], I tried replacing the working residuals in estfunGlmm() with "score"
residuals, and to my surprise I obtained SEs very close to those returned by
GLLAMM (±1%).
So my question is: is this approach statistically correct and valid?
Since I can't say I really understand the maths involved (I'm a medical doctor
by profession), I turned to Achim Zeileis for help. Cutting and pasting from
his answer:
[quote]
Well, one needs to work out the "right" residuals: The estfun() needs to
be the derivative of the log-likelihood with respect to the parameters.
And the bread() essentially the second derivative.
Given that the derivative of the log-likelihood is also called the score
function, it may well be that the "score residuals" that you mention above
are what you are looking for. But I'm not familiar enough with the model
classes and their implementation in R to know for sure.
[/quote]
So, could someone more familiar with the internals of lme4 shed more light on
the issue? If this approach to calculate Huber-White standard error is indeed
correct, I guess it would be useful to many people beside myself (I've seen
this discussed before on this list) and would extend the capabilities of R.
Thank you all in advance,
Theodore Lytras
[1] http://diffuseprior.wordpress.com/2013/01/12/the-cluster-bootstrap/
[2] https://gist.github.com/anonymous/9046934
[3]
http://books.google.com/books?id=tOeqO6Hs-6gC&lpg=PA53&vq=residuals&hl=el&pg=PA55#v=onepage&q&f=false
More information about the R-sig-mixed-models
mailing list