[R-sig-ME] Diagnosing Overdispersion in Mixed Models with Parametric Bootstrapping
Ben Bolker
bbolker at gmail.com
Tue May 13 03:37:15 CEST 2014
Xavier Harrison <xav.harrison at ...> writes:
>
> Hi All
> I have a question regarding the appropriateness of diagnosing
> overdispersion in mixed models using parametric bootstrap as
> provided in the lme4 package.
> Following Zuur et al (2009), Bolker et al (2009), and nicely
> summarised on the glmm wiki, one can approximate overdispersion in a
> model as the ratio of sum of squared residuals to residual degrees
> of freedom. However this suffers the problem of not knowing exactly
> how many residual degrees of freedom there are in a mixed model.
Yes, although keeping in mind that the whole thing is an approximation
anyway, getting the residual df wrong by a few percent is hopefully
not that important
> My question is, given the flexibility of the parametric
> bootstrapping function in lme4, could one not calculate
> overdispersion by looking at the ratio of the model SS residuals to
> the mean SS residuals from n parametric bootstrap samples? Does it
> follow that if the data are well approximated by a Poisson process,
> this ratio should be roughly 1, whereas if there is extra
> variability in the data than a Poisson can predict', the ratio
> should be a fair bit higher?
This seems quite reasonable.
> Below I have provided code to reproduce an example along the lines
> of what I have just tried to describe. I have simulated two sets of
> data where a Poisson outcome (number of offspring) is affected by
> body size (with positive slope). In both cases the data are
> generated from Poisson process, but in the latter case I add some
> random noise to the linear predictor drawn from a normal
> distribution with mean 0 and known SD. The code below models the
> effect of body size on offspring, performs parametric bootstrapping,
> and then calculates the ratio of observed model SS residuals to mean
> parametrically bootstrapped SS residuals. As expected, the "over
> dispersed" data yield a SS ratio of much greater than 1.
This particular model seems to be more or less exactly equivalent
to using an observation-level random effect -- see below ...
> If this approach seems valid, it seems it would be simple to use sim
> to sim variability to put some CIs on this point estimate.
> Any help greatly appreciated
The parametric bootstrap is worthwhile for finding reliable
confidence intervals, but otherwise a parametric approach (e.g.
likelihood
I would do this as follows (slightly more compactly):
n.pops <- 10
n.indiv <- 50
set.seed(101)
popid <- gl(n.pops,n.indiv)
bodysize <- rnorm(length(popid),30,4)
d <- data.frame(popid,bodysize,obs=factor(1:(n.pops*n.indiv)))
library(lme4)
d$y <- simulate(~bodysize+(1|popid)+(1|obs),family="poisson",
newdata=d,newparams=list(theta=c(0.5,0.8),
beta=c(-0.5,0.05)))[[1]]
f <- glmer(y~bodysize+(1|popid)+(1|obs),family="poisson",
data=d)
p <- profile(f,which="theta_")
confint(p)
Random effects:
Groups Name Std.Dev.
obs (Intercept) 0.4795
popid (Intercept) 0.7990
Number of obs: 500, groups: obs, 500 popid, 10
Fixed Effects:
(Intercept) bodysize
-1.05462 0.06281
The intercept looks a bit low, but the std. error is also large
(0.37)
confidence intervals:
2.5 % 97.5 %
.sig01 0.4076925 0.5568827
.sig02 0.5328584 1.3516996
More information about the R-sig-mixed-models
mailing list