[R-sig-ME] Question about zero-inflated Poisson glmer

Philipp Singer killver at gmail.com
Thu Jun 23 17:22:36 CEST 2016


Thanks, great information, that is really helpful.

I agree that those are different things, however when using a random 
effect for overdispersion, I can simulate the same number of zero 
outcomes (~95%).

On 23.06.2016 15:50, Thierry Onkelinx wrote:
> Be careful when using overdispersion to model zero-inflation. Those 
> are two different things.
>
> I've put some information together in 
> http://rpubs.com/INBOstats/zeroinflation
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature 
> and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no 
> more than asking him to perform a post-mortem examination: he may be 
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does 
> not ensure that a reasonable answer can be extracted from a given body 
> of data. ~ John Tukey
>
> 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com 
> <mailto:killver at gmail.com>>:
>
>     Thanks! Actually, accounting for overdispersion is super important
>     as it seems, then the zeros can be captured well.
>
>
>     On 23.06.2016 11:50, Thierry Onkelinx wrote:
>>     Dear Philipp,
>>
>>     1. Fit a Poisson model to the data.
>>     2. Simulate a new response vector for the dataset according to
>>     the model.
>>     3. Count the number of zero's in the simulated response vector.
>>     4. Repeat step 2 and 3 a decent number of time and plot a
>>     histogram of the number of zero's in the simulation. If the
>>     number of zero's in the original dataset is larger than those in
>>     the simulations, then the model can't capture all zero's. In such
>>     case, first try to update the model and repeat the procedure. If
>>     that fails, look for zero-inflated models.
>>
>>     Best regards,
>>
>>     ir. Thierry Onkelinx
>>     Instituut voor natuur- en bosonderzoek / Research Institute for
>>     Nature and Forest
>>     team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>>     Assurance
>>     Kliniekstraat 25
>>     1070 Anderlecht
>>     Belgium
>>
>>     To call in the statistician after the experiment is done may be
>>     no more than asking him to perform a post-mortem examination: he
>>     may be able to say what the experiment died of. ~ Sir Ronald
>>     Aylmer Fisher
>>     The plural of anecdote is not data. ~ Roger Brinner
>>     The combination of some data and an aching desire for an answer
>>     does not ensure that a reasonable answer can be extracted from a
>>     given body of data. ~ John Tukey
>>
>>     2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
>>     <mailto:killver at gmail.com>>:
>>
>>         Thanks Thierry - That totally makes sense. Is there some way
>>         of formally
>>         checking that, except thinking about the setting and
>>         underlying processes?
>>
>>         On 23.06.2016 11:04, Thierry Onkelinx wrote:
>>         > Dear Philipp,
>>         >
>>         > Do you have just lots of zero's, or more zero's than the
>>         Poisson
>>         > distribution can explain? Those are two different things.
>>         The example
>>         > below generates data from a Poisson distribution and has
>>         99% zero's
>>         > but no zero-inflation. The second example has only 1%
>>         zero's but is
>>         > clearly zero-inflated.
>>         >
>>         > set.seed(1)
>>         > n <- 1e8
>>         > sim <- rpois(n, lambda = 0.01)
>>         > mean(sim == 0)
>>         > hist(sim)
>>         >
>>         > sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
>>         lambda = 1000)
>>         > mean(sim.infl == 0)
>>         > hist(sim.infl)
>>         >
>>         > So before looking for zero-inflated models, try to model
>>         the zero's.
>>         >
>>         > Best regards,
>>         >
>>         >
>>         > ir. Thierry Onkelinx
>>         > Instituut voor natuur- en bosonderzoek / Research Institute
>>         for Nature
>>         > and Forest
>>         > team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>>         Assurance
>>         > Kliniekstraat 25
>>         > 1070 Anderlecht
>>         > Belgium
>>         >
>>         > To call in the statistician after the experiment is done
>>         may be no
>>         > more than asking him to perform a post-mortem examination:
>>         he may be
>>         > able to say what the experiment died of. ~ Sir Ronald
>>         Aylmer Fisher
>>         > The plural of anecdote is not data. ~ Roger Brinner
>>         > The combination of some data and an aching desire for an
>>         answer does
>>         > not ensure that a reasonable answer can be extracted from a
>>         given body
>>         > of data. ~ John Tukey
>>         >
>>         > 2016-06-23 10:07 GMT+02:00 Philipp Singer
>>         <killver at gmail.com <mailto:killver at gmail.com>
>>         > <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>         >
>>         >     Dear group - I am currently fitting a Poisson glmer
>>         where I have
>>         >     an excess of outcomes that are zero (>95%). I am now
>>         debating on
>>         >     how to proceed and came up with three options:
>>         >
>>         >     1.) Just fit a regular glmer to the complete data. I am
>>         not fully
>>         >     sure how interpret the coefficients then, are they more
>>         optimizing
>>         >     towards distinguishing zero and non-zero, or also
>>         capturing the
>>         >     differences in those outcomes that are non-zero?
>>         >
>>         >     2.) Leave all zeros out of the data and fit a glmer to
>>         only those
>>         >     outcomes that are non-zero. Then, I would only learn about
>>         >     differences in the non-zero outcomes though.
>>         >
>>         >     3.) Use a zero-inflated Poisson model. My data is quite
>>         >     large-scale, so I am currently playing around with the EM
>>         >     implementation of Bolker et al. that alternates between
>>         fitting a
>>         >     glmer with data that are weighted according to their zero
>>         >     probability, and fitting a logistic regression for the
>>         probability
>>         >     that a data point is zero. The method is elaborated for
>>         the OWL
>>         >     data in:
>>         >
>>         https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>         >
>>         >     I am not fully sure how to interpret the results for the
>>         >     zero-inflated version though. Would I need to interpret the
>>         >     coefficients for the result of the glmer similar to as
>>         I would do
>>         >     for my idea of 2)? And then on top of that interpret the
>>         >     coefficients for the logistic regression regarding whether
>>         >     something is in the perfect or imperfect state? I am
>>         also not
>>         >     quite sure what the common approach for the zformula is
>>         here. The
>>         >     OWL elaborations only use zformula=z~1, so no random
>>         effect; I
>>         >     would use the same formula as for the glmer.
>>         >
>>         >     I am appreciating some help and pointers.
>>         >
>>         >     Thanks!
>>         >     Philipp
>>         >
>>         >  _______________________________________________
>>         > R-sig-mixed-models at r-project.org
>>         <mailto:R-sig-mixed-models at r-project.org>
>>         >     <mailto:R-sig-mixed-models at r-project.org
>>         <mailto:R-sig-mixed-models at r-project.org>> mailing list
>>         > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>         >
>>         >
>>
>>
>>                 [[alternative HTML version deleted]]
>>
>>         _______________________________________________
>>         R-sig-mixed-models at r-project.org
>>         <mailto:R-sig-mixed-models at r-project.org> mailing list
>>         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>
>


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list