[R-sig-ME] Question about zero-inflated Poisson glmer

Mon Jun 27 15:06:45 CEST 2016

Dear Philipp,

How strong is the variance of the observation level random effect? I would
trust a model with large OLRE variance.

Best regards,

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-06-27 14:59 GMT+02:00 Philipp Singer <killver op gmail.com>:

> I have now played around more with the data an the models both using lme4
> and glmmTMB.
>
> I can report the following:
>
> Modeling the data with a zero-inflated Poisson improves the model
> significantly. However, when calling predict and simulating rpoissons, I
> end up with nearly no values that are zero (in the original data there are
> 96% zero).
>
> When I model the data with overdisperion by including an observation-level
> random effect, I can also improve the model (not surprisingly due to the
> random effect). When I predict outcomes by ignoring the observation-level
> random effect (in lme4), I receive bad prediction if I compare it to the
> original data. While many zeros can be captured (of course), the positive
> outcomes can not be captured well.
>
> Combining zero inflation and overdispersion further improves the model, but
> I can only do that with glmmTMB and then have troubles doing predictions
> ignoring the observation-level random effect.
>
> Another side question:
>
> In lme4, when I do:
>
> m = glm(x~1,family="poisson")
> rpois(n=len(data),lambda=predict(m, type='response',re.form=NA)
>
> vs.
>
> simulate(1,m,re.form=NA)
>
> I receive different outcomes? Do I understand these function wrongly?
>
> Would appreciate some more help/pointers!
>
> Thanks,
> Philipp
>
> 2016-06-24 15:52 GMT+02:00 Philipp Singer <killver op gmail.com>:
>
> > Thanks - I started an issue there to answer some of my questions.
> >
> > Regarding the installation: I was trying to somehow do it in anaconda
> with
> > a specific R kernel and had some issues. I am trying to resort that with
> > the anaconda guys though, if I have a tutorial on how to properly setup
> > glmmTMB in anaconda, I will let you know. The install worked fine in my
> > standard R environment.
> >
> >
> > On 24.06.2016 15:40, Ben Bolker wrote:
> >
> >>   Probably for now the glmmTMB issues page is best.
> >>
> >>   When you go there:
> >>
> >>    - details on installation problems/hiccups would be useful
> >>    - a reproducible example for the problem listed below would be useful
> >>    - dispformula is for allowing dispersion/residual variance to vary
> >> with covariates (i.e., modeling heteroscedasticity)
> >>
> >>    cheers
> >>      Ben Bolker
> >>
> >>
> >> On 16-06-24 09:13 AM, Philipp Singer wrote:
> >>
> >>> Update, I tried it like that, but receive an error message.
> >>>
> >>> Warning message:
> >>> In nlminb(start = par, objective = fn, gradient = gr): NA/NaN function
> >>> evaluation
> >>>
> >>> Error in solve.default(hessian.fixed): Lapack routine dgesv: system is
> >>> exactly singular: U[3,3] = 0
> >>> Traceback:
> >>>
> >>> 1. glmmTMB(y ~ 1 + x + (1 | b),
> >>>    .     data = data, family = "poisson", dispformula = ~1 + x)
> >>> 2. sdreport(obj)
> >>> 3. solve(hessian.fixed)
> >>> 4. solve(hessian.fixed)
> >>> 5. solve.default(hessian.fixed)
> >>>
> >>> Any ideas on that?
> >>>
> >>> BTW: Is it fine to post glmmTMB questions here, or should I rather use
> >>> the github issue page, or is there maybe a dedicated mailing list?
> >>>
> >>> Thanks,
> >>> Philipp
> >>>
> >>> On 24.06.2016 14:35, Philipp Singer wrote:
> >>>
> >>>> It indeed seems to run quite fast; had some trouble installing, but
> >>>> works now on my 3.3 R setup.
> >>>>
> >>>> One question I have is regarding the specification of dispersion as I
> >>>> need to specify the dispformula. What is the difference here between
> >>>> just specifying fixed effects vs. also the random effects?
> >>>>
> >>>> On 23.06.2016 23:07, Mollie Brooks wrote:
> >>>>
> >>>>> glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB")
> >>>>> and it was 2.3 times faster than glmer for one simple GLMM.
> >>>>>
> >>>>>
> >>>>> On 23Jun 2016, at 10:44, Philipp Singer <killver op gmail.com> wrote:
> >>>>>>
> >>>>>> Did try glmmADMB but unfortunately it is way too slow for my data.
> >>>>>>
> >>>>>> Did not know about glmmTMB, will try it out. Does it work with
> >>>>>> crossed random effects and how does it scale with more data? I will
> >>>>>> check the docu and try it though. Thanks for the info.
> >>>>>>
> >>>>>> On 23.06.2016 19:14, Ben Bolker wrote:
> >>>>>>
> >>>>>>>    I would also comment that glmmTMB is likely to be much faster
> >>>>>>> than the
> >>>>>>> lme4-based EM approach ...
> >>>>>>>
> >>>>>>>    cheers
> >>>>>>>      Ben B.
> >>>>>>>
> >>>>>>> On 16-06-23 12:47 PM, Mollie Brooks wrote:
> >>>>>>>
> >>>>>>>> Hi Philipp,
> >>>>>>>>
> >>>>>>>> You could also try fitting the model with and without ZI using
> >>>>>>>> either
> >>>>>>>> glmmADMB or glmmTMB. Then compare the AICs. I believe model
> >>>>>>>> selection
> >>>>>>>> is useful for this, but I could be missing something since the
> >>>>>>>> simulation procedure that Thierry described seems to recommended
> >>>>>>>> more
> >>>>>>>> often.
> >>>>>>>>
> >>>>>>>> https://github.com/glmmTMB/glmmTMB
> >>>>>>>> http://glmmadmb.r-forge.r-project.org
> >>>>>>>>
> >>>>>>>> glmmTMB is still in the development phase, but we’ve done a lot of
> >>>>>>>> testing.
> >>>>>>>>
> >>>>>>>> cheers, Mollie
> >>>>>>>>
> >>>>>>>> ------------------------ Mollie Brooks, PhD Postdoctoral
> Researcher,
> >>>>>>>> Population Ecology Research Group Department of Evolutionary
> Biology
> >>>>>>>> & Environmental Studies, University of Zürich
> >>>>>>>> http://www.popecol.org/team/mollie-brooks/
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 23Jun 2016, at 8:22, Philipp Singer <killver op gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Thanks, great information, that is really helpful.
> >>>>>>>>>
> >>>>>>>>> I agree that those are different things, however when using a
> >>>>>>>>> random effect for overdispersion, I can simulate the same number
> of
> >>>>>>>>> zero outcomes (~95%).
> >>>>>>>>>
> >>>>>>>>> On 23.06.2016 15:50, Thierry Onkelinx wrote:
> >>>>>>>>>
> >>>>>>>>>> Be careful when using overdispersion to model zero-inflation.
> >>>>>>>>>> Those are two different things.
> >>>>>>>>>>
> >>>>>>>>>> I've put some information together in
> >>>>>>>>>> http://rpubs.com/INBOstats/zeroinflation
> >>>>>>>>>>
> >>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
> >>>>>>>>>> Research Institute for Nature and Forest team Biometrie &
> >>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
> >>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
> >>>>>>>>>>
> >>>>>>>>>> To call in the statistician after the experiment is done may be
> >>>>>>>>>> no more than asking him to perform a post-mortem examination: he
> >>>>>>>>>> may be able to say what the experiment died of. ~ Sir Ronald
> >>>>>>>>>> Aylmer Fisher The plural of anecdote is not data. ~ Roger
> >>>>>>>>>> Brinner The combination of some data and an aching desire for an
> >>>>>>>>>> answer does not ensure that a reasonable answer can be extracted
> >>>>>>>>>> from a given body of data. ~ John Tukey
> >>>>>>>>>>
> >>>>>>>>>> 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver op gmail.com
> >>>>>>>>>> <mailto:killver op gmail.com <mailto:killver op gmail.com>>>:
> >>>>>>>>>>
> >>>>>>>>>> Thanks! Actually, accounting for overdispersion is super
> >>>>>>>>>> important as it seems, then the zeros can be captured well.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 23.06.2016 11:50, Thierry Onkelinx wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Dear Philipp,
> >>>>>>>>>>>
> >>>>>>>>>>> 1. Fit a Poisson model to the data. 2. Simulate a new response
> >>>>>>>>>>> vector for the dataset according to the model. 3. Count the
> >>>>>>>>>>> number of zero's in the simulated response vector. 4. Repeat
> >>>>>>>>>>> step 2 and 3 a decent number of time and plot a histogram of
> >>>>>>>>>>> the number of zero's in the simulation. If the number of zero's
> >>>>>>>>>>> in the original dataset is larger than those in the
> >>>>>>>>>>> simulations, then the model can't capture all zero's. In such
> >>>>>>>>>>> case, first try to update the model and repeat the procedure.
> >>>>>>>>>>> If that fails, look for zero-inflated models.
> >>>>>>>>>>>
> >>>>>>>>>>> Best regards,
> >>>>>>>>>>>
> >>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
> >>>>>>>>>>> Research Institute for Nature and Forest team Biometrie &
> >>>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
> >>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
> >>>>>>>>>>>
> >>>>>>>>>>> To call in the statistician after the experiment is done may
> >>>>>>>>>>> be no more than asking him to perform a post-mortem
> >>>>>>>>>>> examination: he may be able to say what the experiment died of.
> >>>>>>>>>>> ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data.
> >>>>>>>>>>> ~ Roger Brinner The combination of some data and an aching
> >>>>>>>>>>> desire for an answer does not ensure that a reasonable answer
> >>>>>>>>>>> can be extracted from a given body of data. ~ John Tukey
> >>>>>>>>>>>
> >>>>>>>>>>> 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver op gmail.com
> >>>>>>>>>>> <mailto:killver op gmail.com <mailto:killver op gmail.com>>>:
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks Thierry - That totally makes sense. Is there some way of
> >>>>>>>>>>> formally checking that, except thinking about the setting and
> >>>>>>>>>>> underlying processes?
> >>>>>>>>>>>
> >>>>>>>>>>> On 23.06.2016 11:04, Thierry Onkelinx wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Dear Philipp,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Do you have just lots of zero's, or more zero's than the
> >>>>>>>>>>>>
> >>>>>>>>>>> Poisson
> >>>>>>>>>>>
> >>>>>>>>>>>> distribution can explain? Those are two different things.
> >>>>>>>>>>>>
> >>>>>>>>>>> The example
> >>>>>>>>>>>
> >>>>>>>>>>>> below generates data from a Poisson distribution and has
> >>>>>>>>>>>>
> >>>>>>>>>>> 99% zero's
> >>>>>>>>>>>
> >>>>>>>>>>>> but no zero-inflation. The second example has only 1%
> >>>>>>>>>>>>
> >>>>>>>>>>> zero's but is
> >>>>>>>>>>>
> >>>>>>>>>>>> clearly zero-inflated.
> >>>>>>>>>>>>
> >>>>>>>>>>>> set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim
> >>>>>>>>>>>> == 0) hist(sim)
> >>>>>>>>>>>>
> >>>>>>>>>>>> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
> >>>>>>>>>>>>
> >>>>>>>>>>> lambda = 1000)
> >>>>>>>>>>>
> >>>>>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
> >>>>>>>>>>>>
> >>>>>>>>>>>> So before looking for zero-inflated models, try to model
> >>>>>>>>>>>>
> >>>>>>>>>>> the zero's.
> >>>>>>>>>>>
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
> >>>>>>>>>>>> Research Institute
> >>>>>>>>>>>>
> >>>>>>>>>>> for Nature
> >>>>>>>>>>>
> >>>>>>>>>>>> and Forest team Biometrie & Kwaliteitszorg / team Biometrics
> >>>>>>>>>>>> & Quality
> >>>>>>>>>>>>
> >>>>>>>>>>> Assurance
> >>>>>>>>>>>
> >>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
> >>>>>>>>>>>>
> >>>>>>>>>>>> To call in the statistician after the experiment is done
> >>>>>>>>>>>>
> >>>>>>>>>>> may be no
> >>>>>>>>>>>
> >>>>>>>>>>>> more than asking him to perform a post-mortem examination:
> >>>>>>>>>>>>
> >>>>>>>>>>> he may be
> >>>>>>>>>>>
> >>>>>>>>>>>> able to say what the experiment died of. ~ Sir Ronald
> >>>>>>>>>>>>
> >>>>>>>>>>> Aylmer Fisher
> >>>>>>>>>>>
> >>>>>>>>>>>> The plural of anecdote is not data. ~ Roger Brinner The
> >>>>>>>>>>>> combination of some data and an aching desire for an
> >>>>>>>>>>>>
> >>>>>>>>>>> answer does
> >>>>>>>>>>>
> >>>>>>>>>>>> not ensure that a reasonable answer can be extracted from a
> >>>>>>>>>>>>
> >>>>>>>>>>> given body
> >>>>>>>>>>>
> >>>>>>>>>>>> of data. ~ John Tukey
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2016-06-23 10:07 GMT+02:00 Philipp Singer
> >>>>>>>>>>>>
> >>>>>>>>>>> <killver op gmail.com <mailto:killver op gmail.com>
> >>>>>>>>>>> <mailto:killver op gmail.com <mailto:killver op gmail.com>>
> >>>>>>>>>>>
> >>>>>>>>>>>> <mailto:killver op gmail.com <mailto:killver op gmail.com>
> >>>>>>>>>>>> <mailto:killver op gmail.com <mailto:killver op gmail.com>>>>:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Dear group - I am currently fitting a Poisson glmer
> >>>>>>>>>>>>
> >>>>>>>>>>> where I have
> >>>>>>>>>>>
> >>>>>>>>>>>> an excess of outcomes that are zero (>95%). I am now
> >>>>>>>>>>>>
> >>>>>>>>>>> debating on
> >>>>>>>>>>>
> >>>>>>>>>>>> how to proceed and came up with three options:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1.) Just fit a regular glmer to the complete data. I am
> >>>>>>>>>>>>
> >>>>>>>>>>> not fully
> >>>>>>>>>>>
> >>>>>>>>>>>> sure how interpret the coefficients then, are they more
> >>>>>>>>>>>>
> >>>>>>>>>>> optimizing
> >>>>>>>>>>>
> >>>>>>>>>>>> towards distinguishing zero and non-zero, or also
> >>>>>>>>>>>>
> >>>>>>>>>>> capturing the
> >>>>>>>>>>>
> >>>>>>>>>>>> differences in those outcomes that are non-zero?
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2.) Leave all zeros out of the data and fit a glmer to
> >>>>>>>>>>>>
> >>>>>>>>>>> only those
> >>>>>>>>>>>
> >>>>>>>>>>>> outcomes that are non-zero. Then, I would only learn about
> >>>>>>>>>>>> differences in the non-zero outcomes though.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3.) Use a zero-inflated Poisson model. My data is quite
> >>>>>>>>>>>> large-scale, so I am currently playing around with the EM
> >>>>>>>>>>>> implementation of Bolker et al. that alternates between
> >>>>>>>>>>>>
> >>>>>>>>>>> fitting a
> >>>>>>>>>>>
> >>>>>>>>>>>> glmer with data that are weighted according to their zero
> >>>>>>>>>>>> probability, and fitting a logistic regression for the
> >>>>>>>>>>>>
> >>>>>>>>>>> probability
> >>>>>>>>>>>
> >>>>>>>>>>>> that a data point is zero. The method is elaborated for
> >>>>>>>>>>>>
> >>>>>>>>>>> the OWL
> >>>>>>>>>>>
> >>>>>>>>>>>> data in:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
> >>>>>>>>>>> <
> >>>>>>>>>>>
> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
> >>>>>>>>>>> >
> >>>>>>>>>>>
> >>>>>>>>>> I am not fully sure how to interpret the results for the
> >>>>>>>
> >>>>>>>> zero-inflated version though. Would I need to interpret the
> >>>>>>>>>>>> coefficients for the result of the glmer similar to as
> >>>>>>>>>>>>
> >>>>>>>>>>> I would do
> >>>>>>>>>>>
> >>>>>>>>>>>> for my idea of 2)? And then on top of that interpret the
> >>>>>>>>>>>> coefficients for the logistic regression regarding whether
> >>>>>>>>>>>> something is in the perfect or imperfect state? I am
> >>>>>>>>>>>>
> >>>>>>>>>>> also not
> >>>>>>>>>>>
> >>>>>>>>>>>> quite sure what the common approach for the zformula is
> >>>>>>>>>>>>
> >>>>>>>>>>> here. The
> >>>>>>>>>>>
> >>>>>>>>>>>> OWL elaborations only use zformula=z~1, so no random
> >>>>>>>>>>>>
> >>>>>>>>>>> effect; I
> >>>>>>>>>>>
> >>>>>>>>>>>> would use the same formula as for the glmer.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am appreciating some help and pointers.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks! Philipp
> >>>>>>>>>>>>
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> R-sig-mixed-models op r-project.org
> >>>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org>
> >>>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org>
> >>>>>>>>>>>>
> >>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org
> >>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org>>
> >>>>>>>>>>>
> >>>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org
> >>>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org>
> >>>>>>>>>>>>
> >>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org
> >>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org>>> mailing list
> >>>>>>>>>>>
> >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> [[alternative HTML version deleted]]
> >>>>>>>>>>>
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> R-sig-mixed-models op r-project.org
> >>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org>
> >>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org>
> >>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org
> >>>>>>>>>>> <mailto:R-sig-mixed-models op r-project.org>> mailing list
> >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> [[alternative HTML version deleted]]
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> R-sig-mixed-models op r-project.org
> >>>>>>>>> <mailto:R-sig-mixed-models op r-project.org>
> >>>>>>>>> <mailto:R-sig-mixed-models op r-project.org> mailing list
> >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> >>>>>>>>>
> >>>>>>>> [[alternative HTML version deleted]]
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> R-sig-mixed-models op r-project.org
> >>>>>>>> <mailto:R-sig-mixed-models op r-project.org> mailing list
> >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>> R-sig-mixed-models op r-project.org
> >>>>>>> <mailto:R-sig-mixed-models op r-project.org> mailing list
> >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> R-sig-mixed-models op r-project.org
> >>>>>> <mailto:R-sig-mixed-models op r-project.org> mailing list
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>>
> >>>>>
> >>>         [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-mixed-models op r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>
> >>>
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]