[R-sig-ME] Question about zero-inflated Poisson glmer
Mollie Brooks
mbrooks at ufl.edu
Thu Jun 23 18:47:29 CEST 2016
Hi Philipp,
You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often.
https://github.com/glmmTMB/glmmTMB
http://glmmadmb.r-forge.r-project.org
glmmTMB is still in the development phase, but we’ve done a lot of testing.
cheers,
Mollie
------------------------
Mollie Brooks, PhD
Postdoctoral Researcher, Population Ecology Research Group
Department of Evolutionary Biology & Environmental Studies, University of Zürich
http://www.popecol.org/team/mollie-brooks/
> On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote:
>
> Thanks, great information, that is really helpful.
>
> I agree that those are different things, however when using a random
> effect for overdispersion, I can simulate the same number of zero
> outcomes (~95%).
>
> On 23.06.2016 15:50, Thierry Onkelinx wrote:
>> Be careful when using overdispersion to model zero-inflation. Those
>> are two different things.
>>
>> I've put some information together in
>> http://rpubs.com/INBOstats/zeroinflation
>>
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>> and Forest
>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>> Kliniekstraat 25
>> 1070 Anderlecht
>> Belgium
>>
>> To call in the statistician after the experiment is done may be no
>> more than asking him to perform a post-mortem examination: he may be
>> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does
>> not ensure that a reasonable answer can be extracted from a given body
>> of data. ~ John Tukey
>>
>> 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com
>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>
>> Thanks! Actually, accounting for overdispersion is super important
>> as it seems, then the zeros can be captured well.
>>
>>
>> On 23.06.2016 11:50, Thierry Onkelinx wrote:
>>> Dear Philipp,
>>>
>>> 1. Fit a Poisson model to the data.
>>> 2. Simulate a new response vector for the dataset according to
>>> the model.
>>> 3. Count the number of zero's in the simulated response vector.
>>> 4. Repeat step 2 and 3 a decent number of time and plot a
>>> histogram of the number of zero's in the simulation. If the
>>> number of zero's in the original dataset is larger than those in
>>> the simulations, then the model can't capture all zero's. In such
>>> case, first try to update the model and repeat the procedure. If
>>> that fails, look for zero-inflated models.
>>>
>>> Best regards,
>>>
>>> ir. Thierry Onkelinx
>>> Instituut voor natuur- en bosonderzoek / Research Institute for
>>> Nature and Forest
>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>>> Assurance
>>> Kliniekstraat 25
>>> 1070 Anderlecht
>>> Belgium
>>>
>>> To call in the statistician after the experiment is done may be
>>> no more than asking him to perform a post-mortem examination: he
>>> may be able to say what the experiment died of. ~ Sir Ronald
>>> Aylmer Fisher
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> The combination of some data and an aching desire for an answer
>>> does not ensure that a reasonable answer can be extracted from a
>>> given body of data. ~ John Tukey
>>>
>>> 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>>
>>> Thanks Thierry - That totally makes sense. Is there some way
>>> of formally
>>> checking that, except thinking about the setting and
>>> underlying processes?
>>>
>>> On 23.06.2016 11:04, Thierry Onkelinx wrote:
>>>> Dear Philipp,
>>>>
>>>> Do you have just lots of zero's, or more zero's than the
>>> Poisson
>>>> distribution can explain? Those are two different things.
>>> The example
>>>> below generates data from a Poisson distribution and has
>>> 99% zero's
>>>> but no zero-inflation. The second example has only 1%
>>> zero's but is
>>>> clearly zero-inflated.
>>>>
>>>> set.seed(1)
>>>> n <- 1e8
>>>> sim <- rpois(n, lambda = 0.01)
>>>> mean(sim == 0)
>>>> hist(sim)
>>>>
>>>> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
>>> lambda = 1000)
>>>> mean(sim.infl == 0)
>>>> hist(sim.infl)
>>>>
>>>> So before looking for zero-inflated models, try to model
>>> the zero's.
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> ir. Thierry Onkelinx
>>>> Instituut voor natuur- en bosonderzoek / Research Institute
>>> for Nature
>>>> and Forest
>>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>>> Assurance
>>>> Kliniekstraat 25
>>>> 1070 Anderlecht
>>>> Belgium
>>>>
>>>> To call in the statistician after the experiment is done
>>> may be no
>>>> more than asking him to perform a post-mortem examination:
>>> he may be
>>>> able to say what the experiment died of. ~ Sir Ronald
>>> Aylmer Fisher
>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>> The combination of some data and an aching desire for an
>>> answer does
>>>> not ensure that a reasonable answer can be extracted from a
>>> given body
>>>> of data. ~ John Tukey
>>>>
>>>> 2016-06-23 10:07 GMT+02:00 Philipp Singer
>>> <killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>:
>>>>
>>>> Dear group - I am currently fitting a Poisson glmer
>>> where I have
>>>> an excess of outcomes that are zero (>95%). I am now
>>> debating on
>>>> how to proceed and came up with three options:
>>>>
>>>> 1.) Just fit a regular glmer to the complete data. I am
>>> not fully
>>>> sure how interpret the coefficients then, are they more
>>> optimizing
>>>> towards distinguishing zero and non-zero, or also
>>> capturing the
>>>> differences in those outcomes that are non-zero?
>>>>
>>>> 2.) Leave all zeros out of the data and fit a glmer to
>>> only those
>>>> outcomes that are non-zero. Then, I would only learn about
>>>> differences in the non-zero outcomes though.
>>>>
>>>> 3.) Use a zero-inflated Poisson model. My data is quite
>>>> large-scale, so I am currently playing around with the EM
>>>> implementation of Bolker et al. that alternates between
>>> fitting a
>>>> glmer with data that are weighted according to their zero
>>>> probability, and fitting a logistic regression for the
>>> probability
>>>> that a data point is zero. The method is elaborated for
>>> the OWL
>>>> data in:
>>>>
>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
>>>>
>>>> I am not fully sure how to interpret the results for the
>>>> zero-inflated version though. Would I need to interpret the
>>>> coefficients for the result of the glmer similar to as
>>> I would do
>>>> for my idea of 2)? And then on top of that interpret the
>>>> coefficients for the logistic regression regarding whether
>>>> something is in the perfect or imperfect state? I am
>>> also not
>>>> quite sure what the common approach for the zformula is
>>> here. The
>>>> OWL elaborations only use zformula=z~1, so no random
>>> effect; I
>>>> would use the same formula as for the glmer.
>>>>
>>>> I am appreciating some help and pointers.
>>>>
>>>> Thanks!
>>>> Philipp
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
>>> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
>>>> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
>>> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>
>>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
>>> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>
>>>
>>
>>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list