[R-sig-ME] Question about zero-inflated Poisson glmer

Mon Jun 27 14:59:00 CEST 2016

I have now played around more with the data an the models both using lme4
and glmmTMB.

I can report the following:

Modeling the data with a zero-inflated Poisson improves the model
significantly. However, when calling predict and simulating rpoissons, I
end up with nearly no values that are zero (in the original data there are
96% zero).

When I model the data with overdisperion by including an observation-level
random effect, I can also improve the model (not surprisingly due to the
random effect). When I predict outcomes by ignoring the observation-level
random effect (in lme4), I receive bad prediction if I compare it to the
original data. While many zeros can be captured (of course), the positive
outcomes can not be captured well.

Combining zero inflation and overdispersion further improves the model, but
I can only do that with glmmTMB and then have troubles doing predictions
ignoring the observation-level random effect.

Another side question:

In lme4, when I do:

m = glm(x~1,family="poisson")
rpois(n=len(data),lambda=predict(m, type='response',re.form=NA)

vs.

simulate(1,m,re.form=NA)

I receive different outcomes? Do I understand these function wrongly?

Would appreciate some more help/pointers!

Thanks,
Philipp

2016-06-24 15:52 GMT+02:00 Philipp Singer <killver at gmail.com>:

> Thanks - I started an issue there to answer some of my questions.
>
> Regarding the installation: I was trying to somehow do it in anaconda with
> a specific R kernel and had some issues. I am trying to resort that with
> the anaconda guys though, if I have a tutorial on how to properly setup
> glmmTMB in anaconda, I will let you know. The install worked fine in my
> standard R environment.
>
>
> On 24.06.2016 15:40, Ben Bolker wrote:
>
>>   Probably for now the glmmTMB issues page is best.
>>
>>   When you go there:
>>
>>    - details on installation problems/hiccups would be useful
>>    - a reproducible example for the problem listed below would be useful
>>    - dispformula is for allowing dispersion/residual variance to vary
>> with covariates (i.e., modeling heteroscedasticity)
>>
>>    cheers
>>      Ben Bolker
>>
>>
>> On 16-06-24 09:13 AM, Philipp Singer wrote:
>>
>>> Update, I tried it like that, but receive an error message.
>>>
>>> Warning message:
>>> In nlminb(start = par, objective = fn, gradient = gr): NA/NaN function
>>> evaluation
>>>
>>> Error in solve.default(hessian.fixed): Lapack routine dgesv: system is
>>> exactly singular: U[3,3] = 0
>>> Traceback:
>>>
>>> 1. glmmTMB(y ~ 1 + x + (1 | b),
>>>    .     data = data, family = "poisson", dispformula = ~1 + x)
>>> 2. sdreport(obj)
>>> 3. solve(hessian.fixed)
>>> 4. solve(hessian.fixed)
>>> 5. solve.default(hessian.fixed)
>>>
>>> Any ideas on that?
>>>
>>> BTW: Is it fine to post glmmTMB questions here, or should I rather use
>>> the github issue page, or is there maybe a dedicated mailing list?
>>>
>>> Thanks,
>>> Philipp
>>>
>>> On 24.06.2016 14:35, Philipp Singer wrote:
>>>
>>>> It indeed seems to run quite fast; had some trouble installing, but
>>>> works now on my 3.3 R setup.
>>>>
>>>> One question I have is regarding the specification of dispersion as I
>>>> need to specify the dispformula. What is the difference here between
>>>> just specifying fixed effects vs. also the random effects?
>>>>
>>>> On 23.06.2016 23:07, Mollie Brooks wrote:
>>>>
>>>>> glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB")
>>>>> and it was 2.3 times faster than glmer for one simple GLMM.
>>>>>
>>>>>
>>>>> On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote:
>>>>>>
>>>>>> Did try glmmADMB but unfortunately it is way too slow for my data.
>>>>>>
>>>>>> Did not know about glmmTMB, will try it out. Does it work with
>>>>>> crossed random effects and how does it scale with more data? I will
>>>>>> check the docu and try it though. Thanks for the info.
>>>>>>
>>>>>> On 23.06.2016 19:14, Ben Bolker wrote:
>>>>>>
>>>>>>>    I would also comment that glmmTMB is likely to be much faster
>>>>>>> than the
>>>>>>> lme4-based EM approach ...
>>>>>>>
>>>>>>>    cheers
>>>>>>>      Ben B.
>>>>>>>
>>>>>>> On 16-06-23 12:47 PM, Mollie Brooks wrote:
>>>>>>>
>>>>>>>> Hi Philipp,
>>>>>>>>
>>>>>>>> You could also try fitting the model with and without ZI using
>>>>>>>> either
>>>>>>>> glmmADMB or glmmTMB. Then compare the AICs. I believe model
>>>>>>>> selection
>>>>>>>> is useful for this, but I could be missing something since the
>>>>>>>> simulation procedure that Thierry described seems to recommended
>>>>>>>> more
>>>>>>>> often.
>>>>>>>>
>>>>>>>> https://github.com/glmmTMB/glmmTMB
>>>>>>>> http://glmmadmb.r-forge.r-project.org
>>>>>>>>
>>>>>>>> glmmTMB is still in the development phase, but we’ve done a lot of
>>>>>>>> testing.
>>>>>>>>
>>>>>>>> cheers, Mollie
>>>>>>>>
>>>>>>>> ------------------------ Mollie Brooks, PhD Postdoctoral Researcher,
>>>>>>>> Population Ecology Research Group Department of Evolutionary Biology
>>>>>>>> & Environmental Studies, University of Zürich
>>>>>>>> http://www.popecol.org/team/mollie-brooks/
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Thanks, great information, that is really helpful.
>>>>>>>>>
>>>>>>>>> I agree that those are different things, however when using a
>>>>>>>>> random effect for overdispersion, I can simulate the same number of
>>>>>>>>> zero outcomes (~95%).
>>>>>>>>>
>>>>>>>>> On 23.06.2016 15:50, Thierry Onkelinx wrote:
>>>>>>>>>
>>>>>>>>>> Be careful when using overdispersion to model zero-inflation.
>>>>>>>>>> Those are two different things.
>>>>>>>>>>
>>>>>>>>>> I've put some information together in
>>>>>>>>>> http://rpubs.com/INBOstats/zeroinflation
>>>>>>>>>>
>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>>>> Research Institute for Nature and Forest team Biometrie &
>>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>
>>>>>>>>>> To call in the statistician after the experiment is done may be
>>>>>>>>>> no more than asking him to perform a post-mortem examination: he
>>>>>>>>>> may be able to say what the experiment died of. ~ Sir Ronald
>>>>>>>>>> Aylmer Fisher The plural of anecdote is not data. ~ Roger
>>>>>>>>>> Brinner The combination of some data and an aching desire for an
>>>>>>>>>> answer does not ensure that a reasonable answer can be extracted
>>>>>>>>>> from a given body of data. ~ John Tukey
>>>>>>>>>>
>>>>>>>>>> 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com
>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>>>>>>>>>
>>>>>>>>>> Thanks! Actually, accounting for overdispersion is super
>>>>>>>>>> important as it seems, then the zeros can be captured well.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 23.06.2016 11:50, Thierry Onkelinx wrote:
>>>>>>>>>>
>>>>>>>>>>> Dear Philipp,
>>>>>>>>>>>
>>>>>>>>>>> 1. Fit a Poisson model to the data. 2. Simulate a new response
>>>>>>>>>>> vector for the dataset according to the model. 3. Count the
>>>>>>>>>>> number of zero's in the simulated response vector. 4. Repeat
>>>>>>>>>>> step 2 and 3 a decent number of time and plot a histogram of
>>>>>>>>>>> the number of zero's in the simulation. If the number of zero's
>>>>>>>>>>> in the original dataset is larger than those in the
>>>>>>>>>>> simulations, then the model can't capture all zero's. In such
>>>>>>>>>>> case, first try to update the model and repeat the procedure.
>>>>>>>>>>> If that fails, look for zero-inflated models.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>>>>> Research Institute for Nature and Forest team Biometrie &
>>>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>>
>>>>>>>>>>> To call in the statistician after the experiment is done may
>>>>>>>>>>> be no more than asking him to perform a post-mortem
>>>>>>>>>>> examination: he may be able to say what the experiment died of.
>>>>>>>>>>> ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data.
>>>>>>>>>>> ~ Roger Brinner The combination of some data and an aching
>>>>>>>>>>> desire for an answer does not ensure that a reasonable answer
>>>>>>>>>>> can be extracted from a given body of data. ~ John Tukey
>>>>>>>>>>>
>>>>>>>>>>> 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
>>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>>>>>>>>>>
>>>>>>>>>>> Thanks Thierry - That totally makes sense. Is there some way of
>>>>>>>>>>> formally checking that, except thinking about the setting and
>>>>>>>>>>> underlying processes?
>>>>>>>>>>>
>>>>>>>>>>> On 23.06.2016 11:04, Thierry Onkelinx wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dear Philipp,
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have just lots of zero's, or more zero's than the
>>>>>>>>>>>>
>>>>>>>>>>> Poisson
>>>>>>>>>>>
>>>>>>>>>>>> distribution can explain? Those are two different things.
>>>>>>>>>>>>
>>>>>>>>>>> The example
>>>>>>>>>>>
>>>>>>>>>>>> below generates data from a Poisson distribution and has
>>>>>>>>>>>>
>>>>>>>>>>> 99% zero's
>>>>>>>>>>>
>>>>>>>>>>>> but no zero-inflation. The second example has only 1%
>>>>>>>>>>>>
>>>>>>>>>>> zero's but is
>>>>>>>>>>>
>>>>>>>>>>>> clearly zero-inflated.
>>>>>>>>>>>>
>>>>>>>>>>>> set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim
>>>>>>>>>>>> == 0) hist(sim)
>>>>>>>>>>>>
>>>>>>>>>>>> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
>>>>>>>>>>>>
>>>>>>>>>>> lambda = 1000)
>>>>>>>>>>>
>>>>>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
>>>>>>>>>>>>
>>>>>>>>>>>> So before looking for zero-inflated models, try to model
>>>>>>>>>>>>
>>>>>>>>>>> the zero's.
>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>>>>>> Research Institute
>>>>>>>>>>>>
>>>>>>>>>>> for Nature
>>>>>>>>>>>
>>>>>>>>>>>> and Forest team Biometrie & Kwaliteitszorg / team Biometrics
>>>>>>>>>>>> & Quality
>>>>>>>>>>>>
>>>>>>>>>>> Assurance
>>>>>>>>>>>
>>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>>>
>>>>>>>>>>>> To call in the statistician after the experiment is done
>>>>>>>>>>>>
>>>>>>>>>>> may be no
>>>>>>>>>>>
>>>>>>>>>>>> more than asking him to perform a post-mortem examination:
>>>>>>>>>>>>
>>>>>>>>>>> he may be
>>>>>>>>>>>
>>>>>>>>>>>> able to say what the experiment died of. ~ Sir Ronald
>>>>>>>>>>>>
>>>>>>>>>>> Aylmer Fisher
>>>>>>>>>>>
>>>>>>>>>>>> The plural of anecdote is not data. ~ Roger Brinner The
>>>>>>>>>>>> combination of some data and an aching desire for an
>>>>>>>>>>>>
>>>>>>>>>>> answer does
>>>>>>>>>>>
>>>>>>>>>>>> not ensure that a reasonable answer can be extracted from a
>>>>>>>>>>>>
>>>>>>>>>>> given body
>>>>>>>>>>>
>>>>>>>>>>>> of data. ~ John Tukey
>>>>>>>>>>>>
>>>>>>>>>>>> 2016-06-23 10:07 GMT+02:00 Philipp Singer
>>>>>>>>>>>>
>>>>>>>>>>> <killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>>>>>>>>>
>>>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>:
>>>>>>>>>>>>
>>>>>>>>>>>> Dear group - I am currently fitting a Poisson glmer
>>>>>>>>>>>>
>>>>>>>>>>> where I have
>>>>>>>>>>>
>>>>>>>>>>>> an excess of outcomes that are zero (>95%). I am now
>>>>>>>>>>>>
>>>>>>>>>>> debating on
>>>>>>>>>>>
>>>>>>>>>>>> how to proceed and came up with three options:
>>>>>>>>>>>>
>>>>>>>>>>>> 1.) Just fit a regular glmer to the complete data. I am
>>>>>>>>>>>>
>>>>>>>>>>> not fully
>>>>>>>>>>>
>>>>>>>>>>>> sure how interpret the coefficients then, are they more
>>>>>>>>>>>>
>>>>>>>>>>> optimizing
>>>>>>>>>>>
>>>>>>>>>>>> towards distinguishing zero and non-zero, or also
>>>>>>>>>>>>
>>>>>>>>>>> capturing the
>>>>>>>>>>>
>>>>>>>>>>>> differences in those outcomes that are non-zero?
>>>>>>>>>>>>
>>>>>>>>>>>> 2.) Leave all zeros out of the data and fit a glmer to
>>>>>>>>>>>>
>>>>>>>>>>> only those
>>>>>>>>>>>
>>>>>>>>>>>> outcomes that are non-zero. Then, I would only learn about
>>>>>>>>>>>> differences in the non-zero outcomes though.
>>>>>>>>>>>>
>>>>>>>>>>>> 3.) Use a zero-inflated Poisson model. My data is quite
>>>>>>>>>>>> large-scale, so I am currently playing around with the EM
>>>>>>>>>>>> implementation of Bolker et al. that alternates between
>>>>>>>>>>>>
>>>>>>>>>>> fitting a
>>>>>>>>>>>
>>>>>>>>>>>> glmer with data that are weighted according to their zero
>>>>>>>>>>>> probability, and fitting a logistic regression for the
>>>>>>>>>>>>
>>>>>>>>>>> probability
>>>>>>>>>>>
>>>>>>>>>>>> that a data point is zero. The method is elaborated for
>>>>>>>>>>>>
>>>>>>>>>>> the OWL
>>>>>>>>>>>
>>>>>>>>>>>> data in:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>>>>>>>>>> <
>>>>>>>>>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>> I am not fully sure how to interpret the results for the
>>>>>>>
>>>>>>>> zero-inflated version though. Would I need to interpret the
>>>>>>>>>>>> coefficients for the result of the glmer similar to as
>>>>>>>>>>>>
>>>>>>>>>>> I would do
>>>>>>>>>>>
>>>>>>>>>>>> for my idea of 2)? And then on top of that interpret the
>>>>>>>>>>>> coefficients for the logistic regression regarding whether
>>>>>>>>>>>> something is in the perfect or imperfect state? I am
>>>>>>>>>>>>
>>>>>>>>>>> also not
>>>>>>>>>>>
>>>>>>>>>>>> quite sure what the common approach for the zformula is
>>>>>>>>>>>>
>>>>>>>>>>> here. The
>>>>>>>>>>>
>>>>>>>>>>>> OWL elaborations only use zformula=z~1, so no random
>>>>>>>>>>>>
>>>>>>>>>>> effect; I
>>>>>>>>>>>
>>>>>>>>>>>> would use the same formula as for the glmer.
>>>>>>>>>>>>
>>>>>>>>>>>> I am appreciating some help and pointers.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks! Philipp
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> R-sig-mixed-models at r-project.org
>>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>
>>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>>> mailing list
>>>>>>>>>>>
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> R-sig-mixed-models at r-project.org
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>> mailing list
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> R-sig-mixed-models at r-project.org
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-mixed-models at r-project.org
>>>>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org
>>>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models at r-project.org
>>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>

	[[alternative HTML version deleted]]