[R-sig-ME] Question about zero-inflated Poisson glmer

Fri Jun 24 14:35:22 CEST 2016

It indeed seems to run quite fast; had some trouble installing, but 
works now on my 3.3 R setup.

One question I have is regarding the specification of dispersion as I 
need to specify the dispformula. What is the difference here between 
just specifying fixed effects vs. also the random effects?

On 23.06.2016 23:07, Mollie Brooks wrote:
> glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB") 
> and it was 2.3 times faster than glmer for one simple GLMM.
>
>
>> On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com 
>> <mailto:killver at gmail.com>> wrote:
>>
>> Did try glmmADMB but unfortunately it is way too slow for my data.
>>
>> Did not know about glmmTMB, will try it out. Does it work with 
>> crossed random effects and how does it scale with more data? I will 
>> check the docu and try it though. Thanks for the info.
>>
>> On 23.06.2016 19:14, Ben Bolker wrote:
>>>   I would also comment that glmmTMB is likely to be much faster than the
>>> lme4-based EM approach ...
>>>
>>>   cheers
>>>     Ben B.
>>>
>>> On 16-06-23 12:47 PM, Mollie Brooks wrote:
>>>> Hi Philipp,
>>>>
>>>> You could also try fitting the model with and without ZI using either
>>>> glmmADMB or glmmTMB. Then compare the AICs. I believe model selection
>>>> is useful for this, but I could be missing something since the
>>>> simulation procedure that Thierry described seems to recommended more
>>>> often.
>>>>
>>>> https://github.com/glmmTMB/glmmTMB
>>>> http://glmmadmb.r-forge.r-project.org
>>>>
>>>> glmmTMB is still in the development phase, but we’ve done a lot of
>>>> testing.
>>>>
>>>> cheers, Mollie
>>>>
>>>> ------------------------ Mollie Brooks, PhD Postdoctoral Researcher,
>>>> Population Ecology Research Group Department of Evolutionary Biology
>>>> & Environmental Studies, University of Zürich
>>>> http://www.popecol.org/team/mollie-brooks/
>>>>
>>>>
>>>>> On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote:
>>>>>
>>>>> Thanks, great information, that is really helpful.
>>>>>
>>>>> I agree that those are different things, however when using a
>>>>> random effect for overdispersion, I can simulate the same number of
>>>>> zero outcomes (~95%).
>>>>>
>>>>> On 23.06.2016 15:50, Thierry Onkelinx wrote:
>>>>>> Be careful when using overdispersion to model zero-inflation.
>>>>>> Those are two different things.
>>>>>>
>>>>>> I've put some information together in
>>>>>> http://rpubs.com/INBOstats/zeroinflation
>>>>>>
>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>> Research Institute for Nature and Forest team Biometrie &
>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>
>>>>>> To call in the statistician after the experiment is done may be
>>>>>> no more than asking him to perform a post-mortem examination: he
>>>>>> may be able to say what the experiment died of. ~ Sir Ronald
>>>>>> Aylmer Fisher The plural of anecdote is not data. ~ Roger
>>>>>> Brinner The combination of some data and an aching desire for an
>>>>>> answer does not ensure that a reasonable answer can be extracted
>>>>>> from a given body of data. ~ John Tukey
>>>>>>
>>>>>> 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com
>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>>>>>
>>>>>> Thanks! Actually, accounting for overdispersion is super
>>>>>> important as it seems, then the zeros can be captured well.
>>>>>>
>>>>>>
>>>>>> On 23.06.2016 11:50, Thierry Onkelinx wrote:
>>>>>>> Dear Philipp,
>>>>>>>
>>>>>>> 1. Fit a Poisson model to the data. 2. Simulate a new response
>>>>>>> vector for the dataset according to the model. 3. Count the
>>>>>>> number of zero's in the simulated response vector. 4. Repeat
>>>>>>> step 2 and 3 a decent number of time and plot a histogram of
>>>>>>> the number of zero's in the simulation. If the number of zero's
>>>>>>> in the original dataset is larger than those in the
>>>>>>> simulations, then the model can't capture all zero's. In such
>>>>>>> case, first try to update the model and repeat the procedure.
>>>>>>> If that fails, look for zero-inflated models.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>> Research Institute for Nature and Forest team Biometrie &
>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>
>>>>>>> To call in the statistician after the experiment is done may
>>>>>>> be no more than asking him to perform a post-mortem
>>>>>>> examination: he may be able to say what the experiment died of.
>>>>>>> ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data.
>>>>>>> ~ Roger Brinner The combination of some data and an aching
>>>>>>> desire for an answer does not ensure that a reasonable answer
>>>>>>> can be extracted from a given body of data. ~ John Tukey
>>>>>>>
>>>>>>> 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>>>>>>
>>>>>>> Thanks Thierry - That totally makes sense. Is there some way of
>>>>>>> formally checking that, except thinking about the setting and
>>>>>>> underlying processes?
>>>>>>>
>>>>>>> On 23.06.2016 11:04, Thierry Onkelinx wrote:
>>>>>>>> Dear Philipp,
>>>>>>>>
>>>>>>>> Do you have just lots of zero's, or more zero's than the
>>>>>>> Poisson
>>>>>>>> distribution can explain? Those are two different things.
>>>>>>> The example
>>>>>>>> below generates data from a Poisson distribution and has
>>>>>>> 99% zero's
>>>>>>>> but no zero-inflation. The second example has only 1%
>>>>>>> zero's but is
>>>>>>>> clearly zero-inflated.
>>>>>>>>
>>>>>>>> set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim
>>>>>>>> == 0) hist(sim)
>>>>>>>>
>>>>>>>> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
>>>>>>> lambda = 1000)
>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
>>>>>>>>
>>>>>>>> So before looking for zero-inflated models, try to model
>>>>>>> the zero's.
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>> Research Institute
>>>>>>> for Nature
>>>>>>>> and Forest team Biometrie & Kwaliteitszorg / team Biometrics
>>>>>>>> & Quality
>>>>>>> Assurance
>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>
>>>>>>>> To call in the statistician after the experiment is done
>>>>>>> may be no
>>>>>>>> more than asking him to perform a post-mortem examination:
>>>>>>> he may be
>>>>>>>> able to say what the experiment died of. ~ Sir Ronald
>>>>>>> Aylmer Fisher
>>>>>>>> The plural of anecdote is not data. ~ Roger Brinner The
>>>>>>>> combination of some data and an aching desire for an
>>>>>>> answer does
>>>>>>>> not ensure that a reasonable answer can be extracted from a
>>>>>>> given body
>>>>>>>> of data. ~ John Tukey
>>>>>>>>
>>>>>>>> 2016-06-23 10:07 GMT+02:00 Philipp Singer
>>>>>>> <killver at gmail.com <mailto:killver at gmail.com>
>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>:
>>>>>>>>
>>>>>>>> Dear group - I am currently fitting a Poisson glmer
>>>>>>> where I have
>>>>>>>> an excess of outcomes that are zero (>95%). I am now
>>>>>>> debating on
>>>>>>>> how to proceed and came up with three options:
>>>>>>>>
>>>>>>>> 1.) Just fit a regular glmer to the complete data. I am
>>>>>>> not fully
>>>>>>>> sure how interpret the coefficients then, are they more
>>>>>>> optimizing
>>>>>>>> towards distinguishing zero and non-zero, or also
>>>>>>> capturing the
>>>>>>>> differences in those outcomes that are non-zero?
>>>>>>>>
>>>>>>>> 2.) Leave all zeros out of the data and fit a glmer to
>>>>>>> only those
>>>>>>>> outcomes that are non-zero. Then, I would only learn about
>>>>>>>> differences in the non-zero outcomes though.
>>>>>>>>
>>>>>>>> 3.) Use a zero-inflated Poisson model. My data is quite
>>>>>>>> large-scale, so I am currently playing around with the EM
>>>>>>>> implementation of Bolker et al. that alternates between
>>>>>>> fitting a
>>>>>>>> glmer with data that are weighted according to their zero
>>>>>>>> probability, and fitting a logistic regression for the
>>>>>>> probability
>>>>>>>> that a data point is zero. The method is elaborated for
>>>>>>> the OWL
>>>>>>>> data in:
>>>>>>>>
>>>>>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>>>>>> <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
>>>>>>>>
>>> I am not fully sure how to interpret the results for the
>>>>>>>> zero-inflated version though. Would I need to interpret the
>>>>>>>> coefficients for the result of the glmer similar to as
>>>>>>> I would do
>>>>>>>> for my idea of 2)? And then on top of that interpret the
>>>>>>>> coefficients for the logistic regression regarding whether
>>>>>>>> something is in the perfect or imperfect state? I am
>>>>>>> also not
>>>>>>>> quite sure what the common approach for the zformula is
>>>>>>> here. The
>>>>>>>> OWL elaborations only use zformula=z~1, so no random
>>>>>>> effect; I
>>>>>>>> would use the same formula as for the glmer.
>>>>>>>>
>>>>>>>> I am appreciating some help and pointers.
>>>>>>>>
>>>>>>>> Thanks! Philipp
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-mixed-models at r-project.org 
>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>> <mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>> <mailto:R-sig-mixed-models at r-project.org>>> mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org 
>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>> <mailto:R-sig-mixed-models at r-project.org>> mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models at r-project.org 
>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org 
>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org 
>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org 
>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]