[R-sig-ME] Predictions from zero-inflated or hurdle models

Mon Mar 9 22:10:28 CET 2015

Ruben Arslan <rubenarslan at ...> writes:

> 
> Dear list,
> 
> I wanted to ask: Is there any (maybe just back of the envelope) way to
> obtain a response prediction for zero-inflated or hurdle type models?
> I've fit such models in MCMCglmm, but I don't work in ecology and my
> previous experience with explaining such models to "my audience" did not
> bode well. When it comes to humans, the researchers I presented to are not
> used to offspring count being zero-inflated (or acquainted with that
> concept), but in my historical data with high infant mortality, it is (in
> modern data it's actually slightly underdispersed).
> 
> Currently I'm using lme4 and simply splitting my models into two stages
> (finding a mate and having offspring).
> That's okay too, but in one population the effect of interest is not
> clearly visible in either stage, only when both are taken together (but
> then the outcome is zero-inflated).
> I expect to be given a hard time for this and hence thought I'd use a
> binomial model with the outcome offspring>0 as my main model, but that
> turns out to be hard to explain too and doesn't 
> really do the data justice.
> 
> Basically I don't want to be forced to discuss my smallest population as a
> non-replication of the effect because I was insufficiently able to explain
> the statistics behind my reasoning that the effect shows.

  I think the back-of-the envelope answer would be that for a two-stage
model with a prediction of p_i for the probability of having a non-zero
response (or in the case of zero-inflated models, the probability of
_not_ having a structural zero) and a prediction of n_i for the conditional
part of the model, the mean predicted value is p_i*n_i and the 
variance is _approximately_ (p_i*n_i)^2*(var(p_i)/p_i^2 + var(n_i)/n_i^2)
(this is assuming
that you haven't built in any correlation between p_i and n_i, which
would be hard in lme4 but _might_ be possible under certain circumstances
via a multitype model in MCMCglmm).

  Does that help?