[R-sig-ME] Question about zero-inflated Poisson glmer

Thu Jun 23 11:50:11 CEST 2016

Dear Philipp,

1. Fit a Poisson model to the data.
2. Simulate a new response vector for the dataset according to the model.
3. Count the number of zero's in the simulated response vector.
4. Repeat step 2 and 3 a decent number of time and plot a histogram of the
number of zero's in the simulation. If the number of zero's in the original
dataset is larger than those in the simulations, then the model can't
capture all zero's. In such case, first try to update the model and repeat
the procedure. If that fails, look for zero-inflated models.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-06-23 11:27 GMT+02:00 Philipp Singer <killver op gmail.com>:

> Thanks Thierry - That totally makes sense. Is there some way of formally
> checking that, except thinking about the setting and underlying processes?
>
> On 23.06.2016 11:04, Thierry Onkelinx wrote:
> > Dear Philipp,
> >
> > Do you have just lots of zero's, or more zero's than the Poisson
> > distribution can explain? Those are two different things. The example
> > below generates data from a Poisson distribution and has 99% zero's
> > but no zero-inflation. The second example has only 1% zero's but is
> > clearly zero-inflated.
> >
> > set.seed(1)
> > n <- 1e8
> > sim <- rpois(n, lambda = 0.01)
> > mean(sim == 0)
> > hist(sim)
> >
> > sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n, lambda = 1000)
> > mean(sim.infl == 0)
> > hist(sim.infl)
> >
> > So before looking for zero-inflated models, try to model the zero's.
> >
> > Best regards,
> >
> >
> > ir. Thierry Onkelinx
> > Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> > and Forest
> > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> > Kliniekstraat 25
> > 1070 Anderlecht
> > Belgium
> >
> > To call in the statistician after the experiment is done may be no
> > more than asking him to perform a post-mortem examination: he may be
> > able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> > The plural of anecdote is not data. ~ Roger Brinner
> > The combination of some data and an aching desire for an answer does
> > not ensure that a reasonable answer can be extracted from a given body
> > of data. ~ John Tukey
> >
> > 2016-06-23 10:07 GMT+02:00 Philipp Singer <killver op gmail.com
> > <mailto:killver op gmail.com>>:
> >
> >     Dear group - I am currently fitting a Poisson glmer where I have
> >     an excess of outcomes that are zero (>95%). I am now debating on
> >     how to proceed and came up with three options:
> >
> >     1.) Just fit a regular glmer to the complete data. I am not fully
> >     sure how interpret the coefficients then, are they more optimizing
> >     towards distinguishing zero and non-zero, or also capturing the
> >     differences in those outcomes that are non-zero?
> >
> >     2.) Leave all zeros out of the data and fit a glmer to only those
> >     outcomes that are non-zero. Then, I would only learn about
> >     differences in the non-zero outcomes though.
> >
> >     3.) Use a zero-inflated Poisson model. My data is quite
> >     large-scale, so I am currently playing around with the EM
> >     implementation of Bolker et al. that alternates between fitting a
> >     glmer with data that are weighted according to their zero
> >     probability, and fitting a logistic regression for the probability
> >     that a data point is zero. The method is elaborated for the OWL
> >     data in:
> >
> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
> >
> >     I am not fully sure how to interpret the results for the
> >     zero-inflated version though. Would I need to interpret the
> >     coefficients for the result of the glmer similar to as I would do
> >     for my idea of 2)? And then on top of that interpret the
> >     coefficients for the logistic regression regarding whether
> >     something is in the perfect or imperfect state? I am also not
> >     quite sure what the common approach for the zformula is here. The
> >     OWL elaborations only use zformula=z~1, so no random effect; I
> >     would use the same formula as for the glmer.
> >
> >     I am appreciating some help and pointers.
> >
> >     Thanks!
> >     Philipp
> >
> >     _______________________________________________
> >     R-sig-mixed-models op r-project.org
> >     <mailto:R-sig-mixed-models op r-project.org> mailing list
> >     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> >
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]