[R-sig-ME] Diagnosing Overdispersion in Mixed Models with Parametric Bootstrapping

Tue May 13 03:37:15 CEST 2014

Xavier Harrison <xav.harrison at ...> writes:

> 
> Hi All

> I have a question regarding the appropriateness of diagnosing
> overdispersion in mixed models using parametric bootstrap as
> provided in the lme4 package.

> Following Zuur et al (2009), Bolker et al (2009), and nicely
> summarised on the glmm wiki, one can approximate overdispersion in a
> model as the ratio of sum of squared residuals to residual degrees
> of freedom. However this suffers the problem of not knowing exactly
> how many residual degrees of freedom there are in a mixed model.

  Yes, although keeping in mind that the whole thing is an approximation
anyway, getting the residual df wrong by a few percent is hopefully
not that important

> My question is, given the flexibility of the parametric
> bootstrapping function in lme4, could one not calculate
> overdispersion by looking at the ratio of the model SS residuals to
> the mean SS residuals from n parametric bootstrap samples? Does it
> follow that if the data are well approximated by a Poisson process,
> this ratio should be roughly 1, whereas if there is extra
> variability in the data than a Poisson can predict', the ratio
> should be a fair bit higher?

  This seems quite reasonable.

> Below I have provided code to reproduce an example along the lines
> of what I have just tried to describe. I have simulated two sets of
> data where a Poisson outcome (number of offspring) is affected by
> body size (with positive slope). In both cases the data are
> generated from Poisson process, but in the latter case I add some
> random noise to the linear predictor drawn from a normal
> distribution with mean 0 and known SD. The code below models the
> effect of body size on offspring, performs parametric bootstrapping,
> and then calculates the ratio of observed model SS residuals to mean
> parametrically bootstrapped SS residuals.  As expected, the "over
> dispersed" data yield a SS ratio of much greater than 1.

  This particular model seems to be more or less exactly equivalent
to using an observation-level random effect -- see below ...

> If this approach seems valid, it seems it would be simple to use sim
> to sim variability to put some CIs on this point estimate.

> Any help greatly appreciated

  The parametric bootstrap is worthwhile for finding reliable
confidence intervals, but otherwise a parametric approach (e.g.
likelihood

  I would do this as follows (slightly more compactly):

n.pops <- 10
n.indiv <- 50

set.seed(101)
popid <- gl(n.pops,n.indiv)
bodysize <- rnorm(length(popid),30,4)
d <- data.frame(popid,bodysize,obs=factor(1:(n.pops*n.indiv)))
library(lme4)
d$y <- simulate(~bodysize+(1|popid)+(1|obs),family="poisson",
         newdata=d,newparams=list(theta=c(0.5,0.8),
                                  beta=c(-0.5,0.05)))[[1]]

f <- glmer(y~bodysize+(1|popid)+(1|obs),family="poisson",
           data=d)
p <- profile(f,which="theta_")
confint(p)

Random effects:
 Groups Name        Std.Dev.
 obs    (Intercept) 0.4795  
 popid  (Intercept) 0.7990  
Number of obs: 500, groups:  obs, 500 popid, 10
Fixed Effects:
(Intercept)     bodysize  
   -1.05462      0.06281  

The intercept looks a bit low, but the std. error is also large
(0.37)

confidence intervals:

           2.5 %    97.5 %
.sig01 0.4076925 0.5568827
.sig02 0.5328584 1.3516996