[R-sig-ME] Question about zero-inflated Poisson glmer

Thu Jun 23 18:47:29 CEST 2016

Hi Philipp,

You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often.

https://github.com/glmmTMB/glmmTMB
http://glmmadmb.r-forge.r-project.org

glmmTMB is still in the development phase, but we’ve done a lot of testing.

cheers,
Mollie

------------------------
Mollie Brooks, PhD
Postdoctoral Researcher, Population Ecology Research Group
Department of Evolutionary Biology & Environmental Studies, University of Zürich
http://www.popecol.org/team/mollie-brooks/

> On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote:
> 
> Thanks, great information, that is really helpful.
> 
> I agree that those are different things, however when using a random 
> effect for overdispersion, I can simulate the same number of zero 
> outcomes (~95%).
> 
> On 23.06.2016 15:50, Thierry Onkelinx wrote:
>> Be careful when using overdispersion to model zero-inflation. Those 
>> are two different things.
>> 
>> I've put some information together in 
>> http://rpubs.com/INBOstats/zeroinflation
>> 
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature 
>> and Forest
>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>> Kliniekstraat 25
>> 1070 Anderlecht
>> Belgium
>> 
>> To call in the statistician after the experiment is done may be no 
>> more than asking him to perform a post-mortem examination: he may be 
>> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does 
>> not ensure that a reasonable answer can be extracted from a given body 
>> of data. ~ John Tukey
>> 
>> 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com 
>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>> 
>>    Thanks! Actually, accounting for overdispersion is super important
>>    as it seems, then the zeros can be captured well.
>> 
>> 
>>    On 23.06.2016 11:50, Thierry Onkelinx wrote:
>>>    Dear Philipp,
>>> 
>>>    1. Fit a Poisson model to the data.
>>>    2. Simulate a new response vector for the dataset according to
>>>    the model.
>>>    3. Count the number of zero's in the simulated response vector.
>>>    4. Repeat step 2 and 3 a decent number of time and plot a
>>>    histogram of the number of zero's in the simulation. If the
>>>    number of zero's in the original dataset is larger than those in
>>>    the simulations, then the model can't capture all zero's. In such
>>>    case, first try to update the model and repeat the procedure. If
>>>    that fails, look for zero-inflated models.
>>> 
>>>    Best regards,
>>> 
>>>    ir. Thierry Onkelinx
>>>    Instituut voor natuur- en bosonderzoek / Research Institute for
>>>    Nature and Forest
>>>    team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>>>    Assurance
>>>    Kliniekstraat 25
>>>    1070 Anderlecht
>>>    Belgium
>>> 
>>>    To call in the statistician after the experiment is done may be
>>>    no more than asking him to perform a post-mortem examination: he
>>>    may be able to say what the experiment died of. ~ Sir Ronald
>>>    Aylmer Fisher
>>>    The plural of anecdote is not data. ~ Roger Brinner
>>>    The combination of some data and an aching desire for an answer
>>>    does not ensure that a reasonable answer can be extracted from a
>>>    given body of data. ~ John Tukey
>>> 
>>>    2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
>>>    <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>> 
>>>        Thanks Thierry - That totally makes sense. Is there some way
>>>        of formally
>>>        checking that, except thinking about the setting and
>>>        underlying processes?
>>> 
>>>        On 23.06.2016 11:04, Thierry Onkelinx wrote:
>>>> Dear Philipp,
>>>> 
>>>> Do you have just lots of zero's, or more zero's than the
>>>        Poisson
>>>> distribution can explain? Those are two different things.
>>>        The example
>>>> below generates data from a Poisson distribution and has
>>>        99% zero's
>>>> but no zero-inflation. The second example has only 1%
>>>        zero's but is
>>>> clearly zero-inflated.
>>>> 
>>>> set.seed(1)
>>>> n <- 1e8
>>>> sim <- rpois(n, lambda = 0.01)
>>>> mean(sim == 0)
>>>> hist(sim)
>>>> 
>>>> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
>>>        lambda = 1000)
>>>> mean(sim.infl == 0)
>>>> hist(sim.infl)
>>>> 
>>>> So before looking for zero-inflated models, try to model
>>>        the zero's.
>>>> 
>>>> Best regards,
>>>> 
>>>> 
>>>> ir. Thierry Onkelinx
>>>> Instituut voor natuur- en bosonderzoek / Research Institute
>>>        for Nature
>>>> and Forest
>>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>>>        Assurance
>>>> Kliniekstraat 25
>>>> 1070 Anderlecht
>>>> Belgium
>>>> 
>>>> To call in the statistician after the experiment is done
>>>        may be no
>>>> more than asking him to perform a post-mortem examination:
>>>        he may be
>>>> able to say what the experiment died of. ~ Sir Ronald
>>>        Aylmer Fisher
>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>> The combination of some data and an aching desire for an
>>>        answer does
>>>> not ensure that a reasonable answer can be extracted from a
>>>        given body
>>>> of data. ~ John Tukey
>>>> 
>>>> 2016-06-23 10:07 GMT+02:00 Philipp Singer
>>>        <killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>:
>>>> 
>>>>    Dear group - I am currently fitting a Poisson glmer
>>>        where I have
>>>>    an excess of outcomes that are zero (>95%). I am now
>>>        debating on
>>>>    how to proceed and came up with three options:
>>>> 
>>>>    1.) Just fit a regular glmer to the complete data. I am
>>>        not fully
>>>>    sure how interpret the coefficients then, are they more
>>>        optimizing
>>>>    towards distinguishing zero and non-zero, or also
>>>        capturing the
>>>>    differences in those outcomes that are non-zero?
>>>> 
>>>>    2.) Leave all zeros out of the data and fit a glmer to
>>>        only those
>>>>    outcomes that are non-zero. Then, I would only learn about
>>>>    differences in the non-zero outcomes though.
>>>> 
>>>>    3.) Use a zero-inflated Poisson model. My data is quite
>>>>    large-scale, so I am currently playing around with the EM
>>>>    implementation of Bolker et al. that alternates between
>>>        fitting a
>>>>    glmer with data that are weighted according to their zero
>>>>    probability, and fitting a logistic regression for the
>>>        probability
>>>>    that a data point is zero. The method is elaborated for
>>>        the OWL
>>>>    data in:
>>>> 
>>>        https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
>>>> 
>>>>    I am not fully sure how to interpret the results for the
>>>>    zero-inflated version though. Would I need to interpret the
>>>>    coefficients for the result of the glmer similar to as
>>>        I would do
>>>>    for my idea of 2)? And then on top of that interpret the
>>>>    coefficients for the logistic regression regarding whether
>>>>    something is in the perfect or imperfect state? I am
>>>        also not
>>>>    quite sure what the common approach for the zformula is
>>>        here. The
>>>>    OWL elaborations only use zformula=z~1, so no random
>>>        effect; I
>>>>    would use the same formula as for the glmer.
>>>> 
>>>>    I am appreciating some help and pointers.
>>>> 
>>>>    Thanks!
>>>>    Philipp
>>>> 
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
>>>        <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
>>>>    <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
>>>        <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>> 
>>>> 
>>> 
>>> 
>>>                [[alternative HTML version deleted]]
>>> 
>>>        _______________________________________________
>>>        R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
>>>        <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list
>>>        https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>> 
>>> 
>> 
>> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>

	[[alternative HTML version deleted]]