[R-sig-ME] Zero-inflated data and mixed models

Mon Feb 7 06:19:57 CET 2011

Kris Jones <kjones at ...> writes:

> 
> Hello, 
> 
> I'm working with long term catch data for a rare species of fish,
>  and I'm interested in running a generalized
> mixed model to look at environmental variables associated with catch. 
> The problem that I've encountered
> is that the data has a lot of zeros, and I don't believe there's
> code available which deals with zero
> inflated data, but also allows you to use random effects
>  (other than the Bayesian MCMCglmm). Does anyone
> know if that's correct? 

  glmmADMB will also do zero-inflated mixed models.
  However, in its current form (we're working to change this)
it only handles a single random variable.  I would think that
your model really ought to have crossed random effects of year
and sampling location, although you might be able to get away
with sampling location as a random effect plus overdispersion
(zero-inflated negative binomial).

> Essentially, I'm analyzing data where sampling occurred at the same
>  locations for 15 years (about 30
> locations). Because of this, I was hoping to have sampling
> location nested within year as random in the
> model; however, I won't be able to do this using the zero-inflated models 
> or hurdle models that are out
> there. 
> 
> One suggestion someone made in another mailing list was having an
> observation level random effect in the
> model (see code below). Previously, the models wouldn't run, but
> the model ran once this observation
> level random effect was included. The thing I'm wondering is 
> whether this is an appropriate method to deal
> with zero-inflated data. I can't really seem to find much 
> support for this approach, so I am having trouble
> feeling comfortable using it. I don't fully understand
>  what having the observation level random effect
> is doing. 
> 
> So my dilemma is 1) should I use the less supported observation 
> level random effect (which would allow for
> the random effects I figure should be in the model); or 2) 
> should I run a more supported
> zero-inflated/hurdle model which doesn't allow for these random
> effects (then including year and
> station as fixed effects??). 

  If you have enough data, then random vs fixed starts to matter less.

  The basic point that previous respondents have made is that it's
not immediately clear whether "lots of zeros" necessarily means
zero-inflation -- it could just represent count data with a low
mean and high variance (overdispersion).  Observation-level random
effects are one way to deal with overdispersion; using a negative binomial
(as in glmmADMB) is another.  Also, have you ruled out MCMCglmm?  You
can use it with weak priors ...

Warton, David I. 2005. Many zeros does not mean zero inflation: comparing the
goodness-of-fit of parametric models to multivariate abundance data.
Environmetrics 16, no. 3: 275-289. doi:10.1002/env.702.
http://dx.doi.org/10.1002/env.702.