[R-sig-ME] testing for overdispersion in the mixed effects logistic regression

Wed Jan 23 17:40:56 CET 2013

Antonio P. Ramos <ramos.grad.student at ...> writes:

> 
> Hi all,
> 
> I would like to check whether I am properly checking for overdispersion in
> a mixed effects logistic regression. Here is the code:
> 
> m1 <- lmer(mortality.under.2 ~ maternal_age_d +B0 + B4 + V102 + BORD +
> WLTHIND5+ time +                       new_time + democracy + V106 +
>  WLTHIND5*time + WLTHIND5*new_time +      democracy*WLTHIND5 +
>              (1 |CASEID)-1,data=brazil1,family=binomial(link="logit"))
> 
> # testing for overdispersion
> n <- length(residuals(m1)) # the number of obs
> k <- length(unique(brazil1$CASEID)) # the number of random effects
> z <- sum(residuals(m1,type=standardize)^2) # squared, standardize residuals
> cat("the overdispersion factor is",z/(n-k),"\n")
> cat("the p-value of the overdispersion test is", pchisq(z,n-k))
> 

  If your response variable is binary (as I'm guessing it is since
you appear to have a single response and not something of the form
cbind(success,failure), but I can't be sure), then testing for
overdispersion doesn't make any sense -- any among-observation
variance is unidentifiable.

  In any case:

* type=standardize is unlikely to work, the normal expectation 
is that type should be a string (e.g. see ?residuals.lme in the nlme
package); however, the residuals method
in the stable version of lme4 doesn't even take a "type" argument

 getMethod("residuals","mer")@.Data
function (object, ...) 
napredict(attr(object at frame, "na.action"), object at resid)
<environment: namespace:lme4.0>

-- yes, this should be better documented!

help("residuals,mer-method") [ugh] tells you that residuals() returns
the Pearson residuals by default:

 ‘resid’: The residuals, y-mu, weighted by the ‘sqrtrWt’ slot (when
          its length is >0).

The last comments are that (1) you should be using lower.tail=FALSE
in your pchisq() call; (2) it's not clear what the "residual df" really
are for random effects models (subtracting the number of random effects
estimates is probably conservative --- but why aren't you subtracting
the fixed effect parameter df too?); (3) the Pearson residuals are
a fairly poor estimate of the scale parameter/overdispersion when the
data are far from conditionally normal (e.g. binomial or Poisson with
small numbers of counts) -- see Venables and Ripley; (4) there's more
information on all of this at http://glmm.wikidot.com/faq#overdispersion.

  Ben Bolker

 (Yes, this should be better documented!)