[R-sig-ME] Question about zero-inflated Poisson glmer

Fri Jun 24 15:52:27 CEST 2016

Thanks - I started an issue there to answer some of my questions.

Regarding the installation: I was trying to somehow do it in anaconda 
with a specific R kernel and had some issues. I am trying to resort that 
with the anaconda guys though, if I have a tutorial on how to properly 
setup glmmTMB in anaconda, I will let you know. The install worked fine 
in my standard R environment.

On 24.06.2016 15:40, Ben Bolker wrote:
>   Probably for now the glmmTMB issues page is best.
>
>   When you go there:
>
>    - details on installation problems/hiccups would be useful
>    - a reproducible example for the problem listed below would be useful
>    - dispformula is for allowing dispersion/residual variance to vary
> with covariates (i.e., modeling heteroscedasticity)
>
>    cheers
>      Ben Bolker
>
>
> On 16-06-24 09:13 AM, Philipp Singer wrote:
>> Update, I tried it like that, but receive an error message.
>>
>> Warning message:
>> In nlminb(start = par, objective = fn, gradient = gr): NA/NaN function evaluation
>>
>> Error in solve.default(hessian.fixed): Lapack routine dgesv: system is exactly singular: U[3,3] = 0
>> Traceback:
>>
>> 1. glmmTMB(y ~ 1 + x + (1 | b),
>>    .     data = data, family = "poisson", dispformula = ~1 + x)
>> 2. sdreport(obj)
>> 3. solve(hessian.fixed)
>> 4. solve(hessian.fixed)
>> 5. solve.default(hessian.fixed)
>>
>> Any ideas on that?
>>
>> BTW: Is it fine to post glmmTMB questions here, or should I rather use
>> the github issue page, or is there maybe a dedicated mailing list?
>>
>> Thanks,
>> Philipp
>>
>> On 24.06.2016 14:35, Philipp Singer wrote:
>>> It indeed seems to run quite fast; had some trouble installing, but
>>> works now on my 3.3 R setup.
>>>
>>> One question I have is regarding the specification of dispersion as I
>>> need to specify the dispformula. What is the difference here between
>>> just specifying fixed effects vs. also the random effects?
>>>
>>> On 23.06.2016 23:07, Mollie Brooks wrote:
>>>> glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB")
>>>> and it was 2.3 times faster than glmer for one simple GLMM.
>>>>
>>>>
>>>>> On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote:
>>>>>
>>>>> Did try glmmADMB but unfortunately it is way too slow for my data.
>>>>>
>>>>> Did not know about glmmTMB, will try it out. Does it work with
>>>>> crossed random effects and how does it scale with more data? I will
>>>>> check the docu and try it though. Thanks for the info.
>>>>>
>>>>> On 23.06.2016 19:14, Ben Bolker wrote:
>>>>>>    I would also comment that glmmTMB is likely to be much faster
>>>>>> than the
>>>>>> lme4-based EM approach ...
>>>>>>
>>>>>>    cheers
>>>>>>      Ben B.
>>>>>>
>>>>>> On 16-06-23 12:47 PM, Mollie Brooks wrote:
>>>>>>> Hi Philipp,
>>>>>>>
>>>>>>> You could also try fitting the model with and without ZI using either
>>>>>>> glmmADMB or glmmTMB. Then compare the AICs. I believe model selection
>>>>>>> is useful for this, but I could be missing something since the
>>>>>>> simulation procedure that Thierry described seems to recommended more
>>>>>>> often.
>>>>>>>
>>>>>>> https://github.com/glmmTMB/glmmTMB
>>>>>>> http://glmmadmb.r-forge.r-project.org
>>>>>>>
>>>>>>> glmmTMB is still in the development phase, but we’ve done a lot of
>>>>>>> testing.
>>>>>>>
>>>>>>> cheers, Mollie
>>>>>>>
>>>>>>> ------------------------ Mollie Brooks, PhD Postdoctoral Researcher,
>>>>>>> Population Ecology Research Group Department of Evolutionary Biology
>>>>>>> & Environmental Studies, University of Zürich
>>>>>>> http://www.popecol.org/team/mollie-brooks/
>>>>>>>
>>>>>>>
>>>>>>>> On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Thanks, great information, that is really helpful.
>>>>>>>>
>>>>>>>> I agree that those are different things, however when using a
>>>>>>>> random effect for overdispersion, I can simulate the same number of
>>>>>>>> zero outcomes (~95%).
>>>>>>>>
>>>>>>>> On 23.06.2016 15:50, Thierry Onkelinx wrote:
>>>>>>>>> Be careful when using overdispersion to model zero-inflation.
>>>>>>>>> Those are two different things.
>>>>>>>>>
>>>>>>>>> I've put some information together in
>>>>>>>>> http://rpubs.com/INBOstats/zeroinflation
>>>>>>>>>
>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>>> Research Institute for Nature and Forest team Biometrie &
>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>
>>>>>>>>> To call in the statistician after the experiment is done may be
>>>>>>>>> no more than asking him to perform a post-mortem examination: he
>>>>>>>>> may be able to say what the experiment died of. ~ Sir Ronald
>>>>>>>>> Aylmer Fisher The plural of anecdote is not data. ~ Roger
>>>>>>>>> Brinner The combination of some data and an aching desire for an
>>>>>>>>> answer does not ensure that a reasonable answer can be extracted
>>>>>>>>> from a given body of data. ~ John Tukey
>>>>>>>>>
>>>>>>>>> 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com
>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>>>>>>>>
>>>>>>>>> Thanks! Actually, accounting for overdispersion is super
>>>>>>>>> important as it seems, then the zeros can be captured well.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 23.06.2016 11:50, Thierry Onkelinx wrote:
>>>>>>>>>> Dear Philipp,
>>>>>>>>>>
>>>>>>>>>> 1. Fit a Poisson model to the data. 2. Simulate a new response
>>>>>>>>>> vector for the dataset according to the model. 3. Count the
>>>>>>>>>> number of zero's in the simulated response vector. 4. Repeat
>>>>>>>>>> step 2 and 3 a decent number of time and plot a histogram of
>>>>>>>>>> the number of zero's in the simulation. If the number of zero's
>>>>>>>>>> in the original dataset is larger than those in the
>>>>>>>>>> simulations, then the model can't capture all zero's. In such
>>>>>>>>>> case, first try to update the model and repeat the procedure.
>>>>>>>>>> If that fails, look for zero-inflated models.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>>>> Research Institute for Nature and Forest team Biometrie &
>>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>
>>>>>>>>>> To call in the statistician after the experiment is done may
>>>>>>>>>> be no more than asking him to perform a post-mortem
>>>>>>>>>> examination: he may be able to say what the experiment died of.
>>>>>>>>>> ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data.
>>>>>>>>>> ~ Roger Brinner The combination of some data and an aching
>>>>>>>>>> desire for an answer does not ensure that a reasonable answer
>>>>>>>>>> can be extracted from a given body of data. ~ John Tukey
>>>>>>>>>>
>>>>>>>>>> 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>>>>>>>>>
>>>>>>>>>> Thanks Thierry - That totally makes sense. Is there some way of
>>>>>>>>>> formally checking that, except thinking about the setting and
>>>>>>>>>> underlying processes?
>>>>>>>>>>
>>>>>>>>>> On 23.06.2016 11:04, Thierry Onkelinx wrote:
>>>>>>>>>>> Dear Philipp,
>>>>>>>>>>>
>>>>>>>>>>> Do you have just lots of zero's, or more zero's than the
>>>>>>>>>> Poisson
>>>>>>>>>>> distribution can explain? Those are two different things.
>>>>>>>>>> The example
>>>>>>>>>>> below generates data from a Poisson distribution and has
>>>>>>>>>> 99% zero's
>>>>>>>>>>> but no zero-inflation. The second example has only 1%
>>>>>>>>>> zero's but is
>>>>>>>>>>> clearly zero-inflated.
>>>>>>>>>>>
>>>>>>>>>>> set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim
>>>>>>>>>>> == 0) hist(sim)
>>>>>>>>>>>
>>>>>>>>>>> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
>>>>>>>>>> lambda = 1000)
>>>>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
>>>>>>>>>>>
>>>>>>>>>>> So before looking for zero-inflated models, try to model
>>>>>>>>>> the zero's.
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>>>>> Research Institute
>>>>>>>>>> for Nature
>>>>>>>>>>> and Forest team Biometrie & Kwaliteitszorg / team Biometrics
>>>>>>>>>>> & Quality
>>>>>>>>>> Assurance
>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>>
>>>>>>>>>>> To call in the statistician after the experiment is done
>>>>>>>>>> may be no
>>>>>>>>>>> more than asking him to perform a post-mortem examination:
>>>>>>>>>> he may be
>>>>>>>>>>> able to say what the experiment died of. ~ Sir Ronald
>>>>>>>>>> Aylmer Fisher
>>>>>>>>>>> The plural of anecdote is not data. ~ Roger Brinner The
>>>>>>>>>>> combination of some data and an aching desire for an
>>>>>>>>>> answer does
>>>>>>>>>>> not ensure that a reasonable answer can be extracted from a
>>>>>>>>>> given body
>>>>>>>>>>> of data. ~ John Tukey
>>>>>>>>>>>
>>>>>>>>>>> 2016-06-23 10:07 GMT+02:00 Philipp Singer
>>>>>>>>>> <killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>:
>>>>>>>>>>>
>>>>>>>>>>> Dear group - I am currently fitting a Poisson glmer
>>>>>>>>>> where I have
>>>>>>>>>>> an excess of outcomes that are zero (>95%). I am now
>>>>>>>>>> debating on
>>>>>>>>>>> how to proceed and came up with three options:
>>>>>>>>>>>
>>>>>>>>>>> 1.) Just fit a regular glmer to the complete data. I am
>>>>>>>>>> not fully
>>>>>>>>>>> sure how interpret the coefficients then, are they more
>>>>>>>>>> optimizing
>>>>>>>>>>> towards distinguishing zero and non-zero, or also
>>>>>>>>>> capturing the
>>>>>>>>>>> differences in those outcomes that are non-zero?
>>>>>>>>>>>
>>>>>>>>>>> 2.) Leave all zeros out of the data and fit a glmer to
>>>>>>>>>> only those
>>>>>>>>>>> outcomes that are non-zero. Then, I would only learn about
>>>>>>>>>>> differences in the non-zero outcomes though.
>>>>>>>>>>>
>>>>>>>>>>> 3.) Use a zero-inflated Poisson model. My data is quite
>>>>>>>>>>> large-scale, so I am currently playing around with the EM
>>>>>>>>>>> implementation of Bolker et al. that alternates between
>>>>>>>>>> fitting a
>>>>>>>>>>> glmer with data that are weighted according to their zero
>>>>>>>>>>> probability, and fitting a logistic regression for the
>>>>>>>>>> probability
>>>>>>>>>>> that a data point is zero. The method is elaborated for
>>>>>>>>>> the OWL
>>>>>>>>>>> data in:
>>>>>>>>>>>
>>>>>>>>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>>>>>>>>> <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
>>>>>> I am not fully sure how to interpret the results for the
>>>>>>>>>>> zero-inflated version though. Would I need to interpret the
>>>>>>>>>>> coefficients for the result of the glmer similar to as
>>>>>>>>>> I would do
>>>>>>>>>>> for my idea of 2)? And then on top of that interpret the
>>>>>>>>>>> coefficients for the logistic regression regarding whether
>>>>>>>>>>> something is in the perfect or imperfect state? I am
>>>>>>>>>> also not
>>>>>>>>>>> quite sure what the common approach for the zformula is
>>>>>>>>>> here. The
>>>>>>>>>>> OWL elaborations only use zformula=z~1, so no random
>>>>>>>>>> effect; I
>>>>>>>>>>> would use the same formula as for the glmer.
>>>>>>>>>>>
>>>>>>>>>>> I am appreciating some help and pointers.
>>>>>>>>>>>
>>>>>>>>>>> Thanks! Philipp
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> R-sig-mixed-models at r-project.org
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>>> mailing list
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> R-sig-mixed-models at r-project.org
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>> mailing list
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-mixed-models at r-project.org
>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org
>>>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models at r-project.org
>>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>> _______________________________________________
>>>>> R-sig-mixed-models at r-project.org
>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>