[R-sig-ME] Question about zero-inflated Poisson glmer

Thu Jun 23 19:14:33 CEST 2016

  I would also comment that glmmTMB is likely to be much faster than the
lme4-based EM approach ...

  cheers
    Ben B.

On 16-06-23 12:47 PM, Mollie Brooks wrote:
> Hi Philipp,
> 
> You could also try fitting the model with and without ZI using either
> glmmADMB or glmmTMB. Then compare the AICs. I believe model selection
> is useful for this, but I could be missing something since the
> simulation procedure that Thierry described seems to recommended more
> often.
> 
> https://github.com/glmmTMB/glmmTMB 
> http://glmmadmb.r-forge.r-project.org
> 
> glmmTMB is still in the development phase, but we’ve done a lot of
> testing.
> 
> cheers, Mollie
> 
> ------------------------ Mollie Brooks, PhD Postdoctoral Researcher,
> Population Ecology Research Group Department of Evolutionary Biology
> & Environmental Studies, University of Zürich 
> http://www.popecol.org/team/mollie-brooks/
> 
> 
>> On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote:
>> 
>> Thanks, great information, that is really helpful.
>> 
>> I agree that those are different things, however when using a
>> random effect for overdispersion, I can simulate the same number of
>> zero outcomes (~95%).
>> 
>> On 23.06.2016 15:50, Thierry Onkelinx wrote:
>>> Be careful when using overdispersion to model zero-inflation.
>>> Those are two different things.
>>> 
>>> I've put some information together in 
>>> http://rpubs.com/INBOstats/zeroinflation
>>> 
>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>> Research Institute for Nature and Forest team Biometrie &
>>> Kwaliteitszorg / team Biometrics & Quality Assurance 
>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>> 
>>> To call in the statistician after the experiment is done may be
>>> no more than asking him to perform a post-mortem examination: he
>>> may be able to say what the experiment died of. ~ Sir Ronald
>>> Aylmer Fisher The plural of anecdote is not data. ~ Roger
>>> Brinner The combination of some data and an aching desire for an
>>> answer does not ensure that a reasonable answer can be extracted
>>> from a given body of data. ~ John Tukey
>>> 
>>> 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com 
>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>> 
>>> Thanks! Actually, accounting for overdispersion is super
>>> important as it seems, then the zeros can be captured well.
>>> 
>>> 
>>> On 23.06.2016 11:50, Thierry Onkelinx wrote:
>>>> Dear Philipp,
>>>> 
>>>> 1. Fit a Poisson model to the data. 2. Simulate a new response
>>>> vector for the dataset according to the model. 3. Count the
>>>> number of zero's in the simulated response vector. 4. Repeat
>>>> step 2 and 3 a decent number of time and plot a histogram of
>>>> the number of zero's in the simulation. If the number of zero's
>>>> in the original dataset is larger than those in the
>>>> simulations, then the model can't capture all zero's. In such 
>>>> case, first try to update the model and repeat the procedure.
>>>> If that fails, look for zero-inflated models.
>>>> 
>>>> Best regards,
>>>> 
>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>> Research Institute for Nature and Forest team Biometrie &
>>>> Kwaliteitszorg / team Biometrics & Quality Assurance 
>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>> 
>>>> To call in the statistician after the experiment is done may
>>>> be no more than asking him to perform a post-mortem
>>>> examination: he may be able to say what the experiment died of.
>>>> ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data.
>>>> ~ Roger Brinner The combination of some data and an aching
>>>> desire for an answer does not ensure that a reasonable answer
>>>> can be extracted from a given body of data. ~ John Tukey
>>>> 
>>>> 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com 
>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>>> 
>>>> Thanks Thierry - That totally makes sense. Is there some way of
>>>> formally checking that, except thinking about the setting and 
>>>> underlying processes?
>>>> 
>>>> On 23.06.2016 11:04, Thierry Onkelinx wrote:
>>>>> Dear Philipp,
>>>>> 
>>>>> Do you have just lots of zero's, or more zero's than the
>>>> Poisson
>>>>> distribution can explain? Those are two different things.
>>>> The example
>>>>> below generates data from a Poisson distribution and has
>>>> 99% zero's
>>>>> but no zero-inflation. The second example has only 1%
>>>> zero's but is
>>>>> clearly zero-inflated.
>>>>> 
>>>>> set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim
>>>>> == 0) hist(sim)
>>>>> 
>>>>> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
>>>> lambda = 1000)
>>>>> mean(sim.infl == 0) hist(sim.infl)
>>>>> 
>>>>> So before looking for zero-inflated models, try to model
>>>> the zero's.
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> 
>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>> Research Institute
>>>> for Nature
>>>>> and Forest team Biometrie & Kwaliteitszorg / team Biometrics
>>>>> & Quality
>>>> Assurance
>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>> 
>>>>> To call in the statistician after the experiment is done
>>>> may be no
>>>>> more than asking him to perform a post-mortem examination:
>>>> he may be
>>>>> able to say what the experiment died of. ~ Sir Ronald
>>>> Aylmer Fisher
>>>>> The plural of anecdote is not data. ~ Roger Brinner The
>>>>> combination of some data and an aching desire for an
>>>> answer does
>>>>> not ensure that a reasonable answer can be extracted from a
>>>> given body
>>>>> of data. ~ John Tukey
>>>>> 
>>>>> 2016-06-23 10:07 GMT+02:00 Philipp Singer
>>>> <killver at gmail.com <mailto:killver at gmail.com>
>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>
>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>:
>>>>> 
>>>>> Dear group - I am currently fitting a Poisson glmer
>>>> where I have
>>>>> an excess of outcomes that are zero (>95%). I am now
>>>> debating on
>>>>> how to proceed and came up with three options:
>>>>> 
>>>>> 1.) Just fit a regular glmer to the complete data. I am
>>>> not fully
>>>>> sure how interpret the coefficients then, are they more
>>>> optimizing
>>>>> towards distinguishing zero and non-zero, or also
>>>> capturing the
>>>>> differences in those outcomes that are non-zero?
>>>>> 
>>>>> 2.) Leave all zeros out of the data and fit a glmer to
>>>> only those
>>>>> outcomes that are non-zero. Then, I would only learn about 
>>>>> differences in the non-zero outcomes though.
>>>>> 
>>>>> 3.) Use a zero-inflated Poisson model. My data is quite 
>>>>> large-scale, so I am currently playing around with the EM 
>>>>> implementation of Bolker et al. that alternates between
>>>> fitting a
>>>>> glmer with data that are weighted according to their zero 
>>>>> probability, and fitting a logistic regression for the
>>>> probability
>>>>> that a data point is zero. The method is elaborated for
>>>> the OWL
>>>>> data in:
>>>>> 
>>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>>> <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
>>>>>
>>>>>
>>>> 
I am not fully sure how to interpret the results for the
>>>>> zero-inflated version though. Would I need to interpret the 
>>>>> coefficients for the result of the glmer similar to as
>>>> I would do
>>>>> for my idea of 2)? And then on top of that interpret the 
>>>>> coefficients for the logistic regression regarding whether 
>>>>> something is in the perfect or imperfect state? I am
>>>> also not
>>>>> quite sure what the common approach for the zformula is
>>>> here. The
>>>>> OWL elaborations only use zformula=z~1, so no random
>>>> effect; I
>>>>> would use the same formula as for the glmer.
>>>>> 
>>>>> I am appreciating some help and pointers.
>>>>> 
>>>>> Thanks! Philipp
>>>>> 
>>>>> _______________________________________________ 
>>>>> R-sig-mixed-models at r-project.org
>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>> <mailto:R-sig-mixed-models at r-project.org
>>>> <mailto:R-sig-mixed-models at r-project.org>>
>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>> <mailto:R-sig-mixed-models at r-project.org
>>>> <mailto:R-sig-mixed-models at r-project.org>>> mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> [[alternative HTML version deleted]]
>>>> 
>>>> _______________________________________________ 
>>>> R-sig-mixed-models at r-project.org
>>>> <mailto:R-sig-mixed-models at r-project.org> 
>>>> <mailto:R-sig-mixed-models at r-project.org
>>>> <mailto:R-sig-mixed-models at r-project.org>> mailing list 
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>> _______________________________________________ 
>> R-sig-mixed-models at r-project.org
>> <mailto:R-sig-mixed-models at r-project.org> mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________ 
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>