[R-sig-ME] Huber-White standard errors for glmer Poisson fits

Mon Feb 17 11:02:19 CET 2014

Dear list,

I am a PhD student in epidemiology, trying to analyze a multi-site cohort 
study with R. I have a series of mixed-effects Poisson model objects fitted 
with glmer(), and I am trying to find a way to calculate Huber-White robust 
standard errors for these.

The only program I found that calculates Huber-White SEs for mixed-model fits 
is GLLAMM in Stata, so I would like to reproduce this functionality in R.

I took as a starting point the package "sandwich" by Achim Zeileis, which 
unfortunately provides a sandwich covariance matrix only for glm objects. So I 
tried to modify his code (functions estfun.glm(), bread.glm() and sandwich()) 
and the function robust.se() found in [1] to make it work with glmer objects, 
simplifying things as much as possible (focusing only on Poisson fits). 

Thus I came up with the code in [2]. Function robustSEglmm() takes a glmer fit 
with family="Poisson" and the associated clustering variable and tries to 
compute Huber-White standard errors.

What I found was the following: if estfunGlmm() uses the "working" residuals 
(provided by residuals.merMod()), as is the case in the original estfun.glm() 
and explained in the "sandwich" package documentation, the resulting SEs are 
far too large, 6-10 times as those produced by GLLAMM.

However, after reading a few things about the various types of residuals in 
[3], I tried replacing the working residuals in estfunGlmm() with "score" 
residuals, and to my surprise I obtained SEs very close to those returned by 
GLLAMM (±1%).

So my question is: is this approach statistically correct and valid?

Since I can't say I really understand the maths involved (I'm a medical doctor 
by profession), I turned to Achim Zeileis for help. Cutting and pasting from 
his answer:

[quote]

Well, one needs to work out the "right" residuals: The estfun() needs to 
be the derivative of the log-likelihood with respect to the parameters. 
And the bread() essentially the second derivative.

Given that the derivative of the log-likelihood is also called the score 
function, it may well be that the "score residuals" that you mention above 
are what you are looking for. But I'm not familiar enough with the model 
classes and their implementation in R to know for sure.

[/quote]

So, could someone more familiar with the internals of lme4 shed more light on 
the issue? If this approach to calculate Huber-White standard error is indeed 
correct, I guess it would be useful to many people beside myself (I've seen 
this discussed before on this list) and would extend the capabilities of R.

Thank you all in advance,

Theodore Lytras

[1] http://diffuseprior.wordpress.com/2013/01/12/the-cluster-bootstrap/
[2] https://gist.github.com/anonymous/9046934
[3] 
http://books.google.com/books?id=tOeqO6Hs-6gC&lpg=PA53&vq=residuals&hl=el&pg=PA55#v=onepage&q&f=false