[R-sig-ME] Question about zero-inflated Poisson glmer

Thu Jun 23 12:42:05 CEST 2016

Thanks! Actually, accounting for overdispersion is super important as it 
seems, then the zeros can be captured well.

On 23.06.2016 11:50, Thierry Onkelinx wrote:
> Dear Philipp,
>
> 1. Fit a Poisson model to the data.
> 2. Simulate a new response vector for the dataset according to the model.
> 3. Count the number of zero's in the simulated response vector.
> 4. Repeat step 2 and 3 a decent number of time and plot a histogram of 
> the number of zero's in the simulation. If the number of zero's in the 
> original dataset is larger than those in the simulations, then the 
> model can't capture all zero's. In such case, first try to update the 
> model and repeat the procedure. If that fails, look for zero-inflated 
> models.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature 
> and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no 
> more than asking him to perform a post-mortem examination: he may be 
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does 
> not ensure that a reasonable answer can be extracted from a given body 
> of data. ~ John Tukey
>
> 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com 
> <mailto:killver at gmail.com>>:
>
>     Thanks Thierry - That totally makes sense. Is there some way of
>     formally
>     checking that, except thinking about the setting and underlying
>     processes?
>
>     On 23.06.2016 11:04, Thierry Onkelinx wrote:
>     > Dear Philipp,
>     >
>     > Do you have just lots of zero's, or more zero's than the Poisson
>     > distribution can explain? Those are two different things. The
>     example
>     > below generates data from a Poisson distribution and has 99% zero's
>     > but no zero-inflation. The second example has only 1% zero's but is
>     > clearly zero-inflated.
>     >
>     > set.seed(1)
>     > n <- 1e8
>     > sim <- rpois(n, lambda = 0.01)
>     > mean(sim == 0)
>     > hist(sim)
>     >
>     > sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n, lambda =
>     1000)
>     > mean(sim.infl == 0)
>     > hist(sim.infl)
>     >
>     > So before looking for zero-inflated models, try to model the zero's.
>     >
>     > Best regards,
>     >
>     >
>     > ir. Thierry Onkelinx
>     > Instituut voor natuur- en bosonderzoek / Research Institute for
>     Nature
>     > and Forest
>     > team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>     Assurance
>     > Kliniekstraat 25
>     > 1070 Anderlecht
>     > Belgium
>     >
>     > To call in the statistician after the experiment is done may be no
>     > more than asking him to perform a post-mortem examination: he may be
>     > able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
>     > The plural of anecdote is not data. ~ Roger Brinner
>     > The combination of some data and an aching desire for an answer does
>     > not ensure that a reasonable answer can be extracted from a
>     given body
>     > of data. ~ John Tukey
>     >
>     > 2016-06-23 10:07 GMT+02:00 Philipp Singer <killver at gmail.com
>     <mailto:killver at gmail.com>
>     > <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>     >
>     >     Dear group - I am currently fitting a Poisson glmer where I have
>     >     an excess of outcomes that are zero (>95%). I am now debating on
>     >     how to proceed and came up with three options:
>     >
>     >     1.) Just fit a regular glmer to the complete data. I am not
>     fully
>     >     sure how interpret the coefficients then, are they more
>     optimizing
>     >     towards distinguishing zero and non-zero, or also capturing the
>     >     differences in those outcomes that are non-zero?
>     >
>     >     2.) Leave all zeros out of the data and fit a glmer to only
>     those
>     >     outcomes that are non-zero. Then, I would only learn about
>     >     differences in the non-zero outcomes though.
>     >
>     >     3.) Use a zero-inflated Poisson model. My data is quite
>     >     large-scale, so I am currently playing around with the EM
>     >     implementation of Bolker et al. that alternates between
>     fitting a
>     >     glmer with data that are weighted according to their zero
>     >     probability, and fitting a logistic regression for the
>     probability
>     >     that a data point is zero. The method is elaborated for the OWL
>     >     data in:
>     >
>     https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>     >
>     >     I am not fully sure how to interpret the results for the
>     >     zero-inflated version though. Would I need to interpret the
>     >     coefficients for the result of the glmer similar to as I
>     would do
>     >     for my idea of 2)? And then on top of that interpret the
>     >     coefficients for the logistic regression regarding whether
>     >     something is in the perfect or imperfect state? I am also not
>     >     quite sure what the common approach for the zformula is
>     here. The
>     >     OWL elaborations only use zformula=z~1, so no random effect; I
>     >     would use the same formula as for the glmer.
>     >
>     >     I am appreciating some help and pointers.
>     >
>     >     Thanks!
>     >     Philipp
>     >
>     >     _______________________________________________
>     > R-sig-mixed-models at r-project.org
>     <mailto:R-sig-mixed-models at r-project.org>
>     >     <mailto:R-sig-mixed-models at r-project.org
>     <mailto:R-sig-mixed-models at r-project.org>> mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     >
>     >
>
>
>             [[alternative HTML version deleted]]
>
>     _______________________________________________
>     R-sig-mixed-models at r-project.org
>     <mailto:R-sig-mixed-models at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

	[[alternative HTML version deleted]]