[R-sig-ME] Question about zero-inflated Poisson glmer

Ben Bolker bbolker at gmail.com
Fri Jun 24 15:40:52 CEST 2016


 Probably for now the glmmTMB issues page is best.

 When you go there:

  - details on installation problems/hiccups would be useful
  - a reproducible example for the problem listed below would be useful
  - dispformula is for allowing dispersion/residual variance to vary
with covariates (i.e., modeling heteroscedasticity)

  cheers
    Ben Bolker


On 16-06-24 09:13 AM, Philipp Singer wrote:
> Update, I tried it like that, but receive an error message.
> 
> Warning message:
> In nlminb(start = par, objective = fn, gradient = gr): NA/NaN function evaluation
> 
> Error in solve.default(hessian.fixed): Lapack routine dgesv: system is exactly singular: U[3,3] = 0
> Traceback:
> 
> 1. glmmTMB(y ~ 1 + x + (1 | b),
>   .     data = data, family = "poisson", dispformula = ~1 + x)
> 2. sdreport(obj)
> 3. solve(hessian.fixed)
> 4. solve(hessian.fixed)
> 5. solve.default(hessian.fixed)
> 
> Any ideas on that?
> 
> BTW: Is it fine to post glmmTMB questions here, or should I rather use 
> the github issue page, or is there maybe a dedicated mailing list?
> 
> Thanks,
> Philipp
> 
> On 24.06.2016 14:35, Philipp Singer wrote:
>> It indeed seems to run quite fast; had some trouble installing, but 
>> works now on my 3.3 R setup.
>>
>> One question I have is regarding the specification of dispersion as I 
>> need to specify the dispformula. What is the difference here between 
>> just specifying fixed effects vs. also the random effects?
>>
>> On 23.06.2016 23:07, Mollie Brooks wrote:
>>> glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB") 
>>> and it was 2.3 times faster than glmer for one simple GLMM.
>>>
>>>
>>>> On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote:
>>>>
>>>> Did try glmmADMB but unfortunately it is way too slow for my data.
>>>>
>>>> Did not know about glmmTMB, will try it out. Does it work with 
>>>> crossed random effects and how does it scale with more data? I will 
>>>> check the docu and try it though. Thanks for the info.
>>>>
>>>> On 23.06.2016 19:14, Ben Bolker wrote:
>>>>>   I would also comment that glmmTMB is likely to be much faster 
>>>>> than the
>>>>> lme4-based EM approach ...
>>>>>
>>>>>   cheers
>>>>>     Ben B.
>>>>>
>>>>> On 16-06-23 12:47 PM, Mollie Brooks wrote:
>>>>>> Hi Philipp,
>>>>>>
>>>>>> You could also try fitting the model with and without ZI using either
>>>>>> glmmADMB or glmmTMB. Then compare the AICs. I believe model selection
>>>>>> is useful for this, but I could be missing something since the
>>>>>> simulation procedure that Thierry described seems to recommended more
>>>>>> often.
>>>>>>
>>>>>> https://github.com/glmmTMB/glmmTMB
>>>>>> http://glmmadmb.r-forge.r-project.org
>>>>>>
>>>>>> glmmTMB is still in the development phase, but we’ve done a lot of
>>>>>> testing.
>>>>>>
>>>>>> cheers, Mollie
>>>>>>
>>>>>> ------------------------ Mollie Brooks, PhD Postdoctoral Researcher,
>>>>>> Population Ecology Research Group Department of Evolutionary Biology
>>>>>> & Environmental Studies, University of Zürich
>>>>>> http://www.popecol.org/team/mollie-brooks/
>>>>>>
>>>>>>
>>>>>>> On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote:
>>>>>>>
>>>>>>> Thanks, great information, that is really helpful.
>>>>>>>
>>>>>>> I agree that those are different things, however when using a
>>>>>>> random effect for overdispersion, I can simulate the same number of
>>>>>>> zero outcomes (~95%).
>>>>>>>
>>>>>>> On 23.06.2016 15:50, Thierry Onkelinx wrote:
>>>>>>>> Be careful when using overdispersion to model zero-inflation.
>>>>>>>> Those are two different things.
>>>>>>>>
>>>>>>>> I've put some information together in
>>>>>>>> http://rpubs.com/INBOstats/zeroinflation
>>>>>>>>
>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>> Research Institute for Nature and Forest team Biometrie &
>>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>
>>>>>>>> To call in the statistician after the experiment is done may be
>>>>>>>> no more than asking him to perform a post-mortem examination: he
>>>>>>>> may be able to say what the experiment died of. ~ Sir Ronald
>>>>>>>> Aylmer Fisher The plural of anecdote is not data. ~ Roger
>>>>>>>> Brinner The combination of some data and an aching desire for an
>>>>>>>> answer does not ensure that a reasonable answer can be extracted
>>>>>>>> from a given body of data. ~ John Tukey
>>>>>>>>
>>>>>>>> 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com
>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>>>>>>>
>>>>>>>> Thanks! Actually, accounting for overdispersion is super
>>>>>>>> important as it seems, then the zeros can be captured well.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23.06.2016 11:50, Thierry Onkelinx wrote:
>>>>>>>>> Dear Philipp,
>>>>>>>>>
>>>>>>>>> 1. Fit a Poisson model to the data. 2. Simulate a new response
>>>>>>>>> vector for the dataset according to the model. 3. Count the
>>>>>>>>> number of zero's in the simulated response vector. 4. Repeat
>>>>>>>>> step 2 and 3 a decent number of time and plot a histogram of
>>>>>>>>> the number of zero's in the simulation. If the number of zero's
>>>>>>>>> in the original dataset is larger than those in the
>>>>>>>>> simulations, then the model can't capture all zero's. In such
>>>>>>>>> case, first try to update the model and repeat the procedure.
>>>>>>>>> If that fails, look for zero-inflated models.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>>> Research Institute for Nature and Forest team Biometrie &
>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>
>>>>>>>>> To call in the statistician after the experiment is done may
>>>>>>>>> be no more than asking him to perform a post-mortem
>>>>>>>>> examination: he may be able to say what the experiment died of.
>>>>>>>>> ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data.
>>>>>>>>> ~ Roger Brinner The combination of some data and an aching
>>>>>>>>> desire for an answer does not ensure that a reasonable answer
>>>>>>>>> can be extracted from a given body of data. ~ John Tukey
>>>>>>>>>
>>>>>>>>> 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>>>>>>>>>
>>>>>>>>> Thanks Thierry - That totally makes sense. Is there some way of
>>>>>>>>> formally checking that, except thinking about the setting and
>>>>>>>>> underlying processes?
>>>>>>>>>
>>>>>>>>> On 23.06.2016 11:04, Thierry Onkelinx wrote:
>>>>>>>>>> Dear Philipp,
>>>>>>>>>>
>>>>>>>>>> Do you have just lots of zero's, or more zero's than the
>>>>>>>>> Poisson
>>>>>>>>>> distribution can explain? Those are two different things.
>>>>>>>>> The example
>>>>>>>>>> below generates data from a Poisson distribution and has
>>>>>>>>> 99% zero's
>>>>>>>>>> but no zero-inflation. The second example has only 1%
>>>>>>>>> zero's but is
>>>>>>>>>> clearly zero-inflated.
>>>>>>>>>>
>>>>>>>>>> set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim
>>>>>>>>>> == 0) hist(sim)
>>>>>>>>>>
>>>>>>>>>> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
>>>>>>>>> lambda = 1000)
>>>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
>>>>>>>>>>
>>>>>>>>>> So before looking for zero-inflated models, try to model
>>>>>>>>> the zero's.
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>>>> Research Institute
>>>>>>>>> for Nature
>>>>>>>>>> and Forest team Biometrie & Kwaliteitszorg / team Biometrics
>>>>>>>>>> & Quality
>>>>>>>>> Assurance
>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>
>>>>>>>>>> To call in the statistician after the experiment is done
>>>>>>>>> may be no
>>>>>>>>>> more than asking him to perform a post-mortem examination:
>>>>>>>>> he may be
>>>>>>>>>> able to say what the experiment died of. ~ Sir Ronald
>>>>>>>>> Aylmer Fisher
>>>>>>>>>> The plural of anecdote is not data. ~ Roger Brinner The
>>>>>>>>>> combination of some data and an aching desire for an
>>>>>>>>> answer does
>>>>>>>>>> not ensure that a reasonable answer can be extracted from a
>>>>>>>>> given body
>>>>>>>>>> of data. ~ John Tukey
>>>>>>>>>>
>>>>>>>>>> 2016-06-23 10:07 GMT+02:00 Philipp Singer
>>>>>>>>> <killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>:
>>>>>>>>>>
>>>>>>>>>> Dear group - I am currently fitting a Poisson glmer
>>>>>>>>> where I have
>>>>>>>>>> an excess of outcomes that are zero (>95%). I am now
>>>>>>>>> debating on
>>>>>>>>>> how to proceed and came up with three options:
>>>>>>>>>>
>>>>>>>>>> 1.) Just fit a regular glmer to the complete data. I am
>>>>>>>>> not fully
>>>>>>>>>> sure how interpret the coefficients then, are they more
>>>>>>>>> optimizing
>>>>>>>>>> towards distinguishing zero and non-zero, or also
>>>>>>>>> capturing the
>>>>>>>>>> differences in those outcomes that are non-zero?
>>>>>>>>>>
>>>>>>>>>> 2.) Leave all zeros out of the data and fit a glmer to
>>>>>>>>> only those
>>>>>>>>>> outcomes that are non-zero. Then, I would only learn about
>>>>>>>>>> differences in the non-zero outcomes though.
>>>>>>>>>>
>>>>>>>>>> 3.) Use a zero-inflated Poisson model. My data is quite
>>>>>>>>>> large-scale, so I am currently playing around with the EM
>>>>>>>>>> implementation of Bolker et al. that alternates between
>>>>>>>>> fitting a
>>>>>>>>>> glmer with data that are weighted according to their zero
>>>>>>>>>> probability, and fitting a logistic regression for the
>>>>>>>>> probability
>>>>>>>>>> that a data point is zero. The method is elaborated for
>>>>>>>>> the OWL
>>>>>>>>>> data in:
>>>>>>>>>>
>>>>>>>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>>>>>>>> <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
>>>>>>>>>>
>>>>> I am not fully sure how to interpret the results for the
>>>>>>>>>> zero-inflated version though. Would I need to interpret the
>>>>>>>>>> coefficients for the result of the glmer similar to as
>>>>>>>>> I would do
>>>>>>>>>> for my idea of 2)? And then on top of that interpret the
>>>>>>>>>> coefficients for the logistic regression regarding whether
>>>>>>>>>> something is in the perfect or imperfect state? I am
>>>>>>>>> also not
>>>>>>>>>> quite sure what the common approach for the zformula is
>>>>>>>>> here. The
>>>>>>>>>> OWL elaborations only use zformula=z~1, so no random
>>>>>>>>> effect; I
>>>>>>>>>> would use the same formula as for the glmer.
>>>>>>>>>>
>>>>>>>>>> I am appreciating some help and pointers.
>>>>>>>>>>
>>>>>>>>>> Thanks! Philipp
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> R-sig-mixed-models at r-project.org 
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>>> mailing list
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> R-sig-mixed-models at r-project.org 
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org>> mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org 
>>>>>>> <mailto:R-sig-mixed-models at r-project.org>
>>>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models at r-project.org 
>>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models at r-project.org 
>>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org 
>>>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>



More information about the R-sig-mixed-models mailing list