[R-sig-ME] Question about zero-inflated Poisson glmer

Thierry Onkelinx thierry.onkelinx at inbo.be
Mon Jun 27 22:17:58 CEST 2016


A gap between zero and the lowest non-zero count is normal for
zero-inflated data when the Poisson part has a high mean. In that case very
few (if any) zero's stem from the Poisson part.

Another option is to try a hurdle model. Or approximate a hurdle model by
fitting two separate models: a logistic regression (zero or not) and a
Poisson regression (count > 0). In case Prob(Poisson(mu) == 0) is small
then the approximation should be OK.

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-06-27 22:03 GMT+02:00 Philipp Singer <killver op gmail.com>:

> NBinom was not really successfull unitl now, but will try to tune. Thanks
> for your help!
>
> One point I forgot to mention was that apart from my excess of zeros, the
> lowest data outcome is 10, so there is a gap between zeri and 10. Could
> that be somehow a problem?
>
> On 27.06.2016 21:59, Thierry Onkelinx wrote:
>
> If there is overdispersion, then try a negative binomial model or a
> zero-inflated negative binomial model. If not try a zero-inflated Poisson.
> Adding relevant covariates can reduce overdispersion.
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2016-06-27 17:46 GMT+02:00 Philipp Singer <killver op gmail.com>:
>
>> Well, as posted beforehand the std dev is 9.5 ... so does not seem too
>> good then :/
>>
>> Any other idea?
>>
>>
>> On 27.06.2016 17:31, Thierry Onkelinx wrote:
>>
>> Dear Philipp,
>>
>> You've been bitten by observation level random effects. I've put together
>> a document about it on http://rpubs.com/INBOstats/OLRE. Bottomline
>> you're OKish when the standard devation of the OLRE smaller than 1. You're
>> in trouble when it's above 3. In between you need to check the model
>> carefully.
>>
>> Best regards,
>>
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>> and Forest
>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>> Kliniekstraat 25
>> 1070 Anderlecht
>> Belgium
>>
>> To call in the statistician after the experiment is done may be no more
>> than asking him to perform a post-mortem examination: he may be able to say
>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>>
>> 2016-06-27 16:17 GMT+02:00 Philipp Singer < <killver op gmail.com>
>> killver op gmail.com>:
>>
>>> Here is the fitted vs. residual plot for the observation-level poisson
>>> model where the observation level has been removed as taken from:
>>> <https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html>
>>> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html
>>>
>>> So basically the prediction is always close to zero.
>>>
>>> Note that this is just on a very small sample (1000 data points).
>>>
>>> If I fit a nbinom2 to this smalle sample, I get predictions that are
>>> always around ~20 (but never zero). Both plots are attached.
>>>
>>> What I am wondering is whether I can do inference on a fixed parameter
>>> in my model which is my main task of this study. The effect is similar in
>>> the different models and in general I am only itnerested in whether it is
>>> positive/negative and "significant" which it is. However, as can be seen,
>>> the prediction looks not too good here.
>>>
>>>
>>>
>>>
>>> 2016-06-27 15:18 GMT+02:00 Philipp Singer < <killver op gmail.com>
>>> killver op gmail.com>:
>>>
>>>> The variance is:
>>>>
>>>> Conditional model:
>>>>  Groups            Name        Variance  Std.Dev.
>>>>  obs               (Intercept) 8.991e+01 9.4823139
>>>>
>>>>
>>>>
>>>> 2016-06-27 15:06 GMT+02:00 Thierry Onkelinx <
>>>> <thierry.onkelinx op inbo.be>thierry.onkelinx op inbo.be>:
>>>>
>>>>> Dear Philipp,
>>>>>
>>>>> How strong is the variance of the observation level random effect? I
>>>>> would trust a model with large OLRE variance.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Thierry
>>>>>
>>>>> ir. Thierry Onkelinx
>>>>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>>>>> and Forest
>>>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>> Kliniekstraat 25
>>>>> 1070 Anderlecht
>>>>> Belgium
>>>>>
>>>>> To call in the statistician after the experiment is done may be no
>>>>> more than asking him to perform a post-mortem examination: he may be able
>>>>> to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>> The combination of some data and an aching desire for an answer does
>>>>> not ensure that a reasonable answer can be extracted from a given body of
>>>>> data. ~ John Tukey
>>>>>
>>>>> 2016-06-27 14:59 GMT+02:00 Philipp Singer < <killver op gmail.com>
>>>>> killver op gmail.com>:
>>>>>
>>>>>> I have now played around more with the data an the models both using
>>>>>> lme4
>>>>>> and glmmTMB.
>>>>>>
>>>>>> I can report the following:
>>>>>>
>>>>>> Modeling the data with a zero-inflated Poisson improves the model
>>>>>> significantly. However, when calling predict and simulating
>>>>>> rpoissons, I
>>>>>> end up with nearly no values that are zero (in the original data
>>>>>> there are
>>>>>> 96% zero).
>>>>>>
>>>>>> When I model the data with overdisperion by including an
>>>>>> observation-level
>>>>>> random effect, I can also improve the model (not surprisingly due to
>>>>>> the
>>>>>> random effect). When I predict outcomes by ignoring the
>>>>>> observation-level
>>>>>> random effect (in lme4), I receive bad prediction if I compare it to
>>>>>> the
>>>>>> original data. While many zeros can be captured (of course), the
>>>>>> positive
>>>>>> outcomes can not be captured well.
>>>>>>
>>>>>> Combining zero inflation and overdispersion further improves the
>>>>>> model, but
>>>>>> I can only do that with glmmTMB and then have troubles doing
>>>>>> predictions
>>>>>> ignoring the observation-level random effect.
>>>>>>
>>>>>> Another side question:
>>>>>>
>>>>>> In lme4, when I do:
>>>>>>
>>>>>> m = glm(x~1,family="poisson")
>>>>>> rpois(n=len(data),lambda=predict(m, type='response',re.form=NA)
>>>>>>
>>>>>> vs.
>>>>>>
>>>>>> simulate(1,m,re.form=NA)
>>>>>>
>>>>>> I receive different outcomes? Do I understand these function wrongly?
>>>>>>
>>>>>> Would appreciate some more help/pointers!
>>>>>>
>>>>>> Thanks,
>>>>>> Philipp
>>>>>>
>>>>>> 2016-06-24 15:52 GMT+02:00 Philipp Singer < <killver op gmail.com>
>>>>>> killver op gmail.com>:
>>>>>>
>>>>>> > Thanks - I started an issue there to answer some of my questions.
>>>>>> >
>>>>>> > Regarding the installation: I was trying to somehow do it in
>>>>>> anaconda with
>>>>>> > a specific R kernel and had some issues. I am trying to resort that
>>>>>> with
>>>>>> > the anaconda guys though, if I have a tutorial on how to properly
>>>>>> setup
>>>>>> > glmmTMB in anaconda, I will let you know. The install worked fine
>>>>>> in my
>>>>>> > standard R environment.
>>>>>> >
>>>>>> >
>>>>>> > On 24.06.2016 15 <24.06.2016%2015>:40, Ben Bolker wrote:
>>>>>> >
>>>>>> >>   Probably for now the glmmTMB issues page is best.
>>>>>> >>
>>>>>> >>   When you go there:
>>>>>> >>
>>>>>> >>    - details on installation problems/hiccups would be useful
>>>>>> >>    - a reproducible example for the problem listed below would be
>>>>>> useful
>>>>>> >>    - dispformula is for allowing dispersion/residual variance to
>>>>>> vary
>>>>>> >> with covariates (i.e., modeling heteroscedasticity)
>>>>>> >>
>>>>>> >>    cheers
>>>>>> >>      Ben Bolker
>>>>>> >>
>>>>>> >>
>>>>>> >> On 16-06-24 09:13 AM, Philipp Singer wrote:
>>>>>> >>
>>>>>> >>> Update, I tried it like that, but receive an error message.
>>>>>> >>>
>>>>>> >>> Warning message:
>>>>>> >>> In nlminb(start = par, objective = fn, gradient = gr): NA/NaN
>>>>>> function
>>>>>> >>> evaluation
>>>>>> >>>
>>>>>> >>> Error in solve.default(hessian.fixed): Lapack routine dgesv:
>>>>>> system is
>>>>>> >>> exactly singular: U[3,3] = 0
>>>>>> >>> Traceback:
>>>>>> >>>
>>>>>> >>> 1. glmmTMB(y ~ 1 + x + (1 | b),
>>>>>> >>>    .     data = data, family = "poisson", dispformula = ~1 + x)
>>>>>> >>> 2. sdreport(obj)
>>>>>> >>> 3. solve(hessian.fixed)
>>>>>> >>> 4. solve(hessian.fixed)
>>>>>> >>> 5. solve.default(hessian.fixed)
>>>>>> >>>
>>>>>> >>> Any ideas on that?
>>>>>> >>>
>>>>>> >>> BTW: Is it fine to post glmmTMB questions here, or should I
>>>>>> rather use
>>>>>> >>> the github issue page, or is there maybe a dedicated mailing list?
>>>>>> >>>
>>>>>> >>> Thanks,
>>>>>> >>> Philipp
>>>>>> >>>
>>>>>> >>> On 24.06.2016 14:35, Philipp Singer wrote:
>>>>>> >>>
>>>>>> >>>> It indeed seems to run quite fast; had some trouble installing,
>>>>>> but
>>>>>> >>>> works now on my 3.3 R setup.
>>>>>> >>>>
>>>>>> >>>> One question I have is regarding the specification of dispersion
>>>>>> as I
>>>>>> >>>> need to specify the dispformula. What is the difference here
>>>>>> between
>>>>>> >>>> just specifying fixed effects vs. also the random effects?
>>>>>> >>>>
>>>>>> >>>> On 23.06.2016 23:07, Mollie Brooks wrote:
>>>>>> >>>>
>>>>>> >>>>> glmmTMB does crossed RE. Ben did some timings in
>>>>>> vignette("glmmTMB")
>>>>>> >>>>> and it was 2.3 times faster than glmer for one simple GLMM.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> On 23Jun 2016, at 10:44, Philipp Singer < <killver op gmail.com>
>>>>>> killver op gmail.com> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>> Did try glmmADMB but unfortunately it is way too slow for my
>>>>>> data.
>>>>>> >>>>>>
>>>>>> >>>>>> Did not know about glmmTMB, will try it out. Does it work with
>>>>>> >>>>>> crossed random effects and how does it scale with more data? I
>>>>>> will
>>>>>> >>>>>> check the docu and try it though. Thanks for the info.
>>>>>> >>>>>>
>>>>>> >>>>>> On 23.06.2016 19:14, Ben Bolker wrote:
>>>>>> >>>>>>
>>>>>> >>>>>>>    I would also comment that glmmTMB is likely to be much
>>>>>> faster
>>>>>> >>>>>>> than the
>>>>>> >>>>>>> lme4-based EM approach ...
>>>>>> >>>>>>>
>>>>>> >>>>>>>    cheers
>>>>>> >>>>>>>      Ben B.
>>>>>> >>>>>>>
>>>>>> >>>>>>> On 16-06-23 12:47 PM, Mollie Brooks wrote:
>>>>>> >>>>>>>
>>>>>> >>>>>>>> Hi Philipp,
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> You could also try fitting the model with and without ZI
>>>>>> using
>>>>>> >>>>>>>> either
>>>>>> >>>>>>>> glmmADMB or glmmTMB. Then compare the AICs. I believe model
>>>>>> >>>>>>>> selection
>>>>>> >>>>>>>> is useful for this, but I could be missing something since
>>>>>> the
>>>>>> >>>>>>>> simulation procedure that Thierry described seems to
>>>>>> recommended
>>>>>> >>>>>>>> more
>>>>>> >>>>>>>> often.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> <https://github.com/glmmTMB/glmmTMB>
>>>>>> https://github.com/glmmTMB/glmmTMB
>>>>>> >>>>>>>> <http://glmmadmb.r-forge.r-project.org>
>>>>>> http://glmmadmb.r-forge.r-project.org
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> glmmTMB is still in the development phase, but we’ve done a
>>>>>> lot of
>>>>>> >>>>>>>> testing.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> cheers, Mollie
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> ------------------------ Mollie Brooks, PhD Postdoctoral
>>>>>> Researcher,
>>>>>> >>>>>>>> Population Ecology Research Group Department of Evolutionary
>>>>>> Biology
>>>>>> >>>>>>>> & Environmental Studies, University of Zürich
>>>>>> >>>>>>>> <http://www.popecol.org/team/mollie-brooks/>
>>>>>> http://www.popecol.org/team/mollie-brooks/
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> On 23Jun 2016, at 8:22, Philipp Singer < <killver op gmail.com>
>>>>>> killver op gmail.com> wrote:
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> Thanks, great information, that is really helpful.
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> I agree that those are different things, however when using
>>>>>> a
>>>>>> >>>>>>>>> random effect for overdispersion, I can simulate the same
>>>>>> number of
>>>>>> >>>>>>>>> zero outcomes (~95%).
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> On 23.06.2016 15:50, Thierry Onkelinx wrote:
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>>> Be careful when using overdispersion to model
>>>>>> zero-inflation.
>>>>>> >>>>>>>>>> Those are two different things.
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> I've put some information together in
>>>>>> >>>>>>>>>> <http://rpubs.com/INBOstats/zeroinflation>
>>>>>> http://rpubs.com/INBOstats/zeroinflation
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en
>>>>>> bosonderzoek /
>>>>>> >>>>>>>>>> Research Institute for Nature and Forest team Biometrie &
>>>>>> >>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>> >>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> To call in the statistician after the experiment is done
>>>>>> may be
>>>>>> >>>>>>>>>> no more than asking him to perform a post-mortem
>>>>>> examination: he
>>>>>> >>>>>>>>>> may be able to say what the experiment died of. ~ Sir
>>>>>> Ronald
>>>>>> >>>>>>>>>> Aylmer Fisher The plural of anecdote is not data. ~ Roger
>>>>>> >>>>>>>>>> Brinner The combination of some data and an aching desire
>>>>>> for an
>>>>>> >>>>>>>>>> answer does not ensure that a reasonable answer can be
>>>>>> extracted
>>>>>> >>>>>>>>>> from a given body of data. ~ John Tukey
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> 2016-06-23 12:42 GMT+02:00 Philipp Singer <
>>>>>> <killver op gmail.com>killver op gmail.com
>>>>>> >>>>>>>>>> <mailto: <killver op gmail.com>killver op gmail.com <mailto:
>>>>>> <killver op gmail.com>killver op gmail.com>>>:
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> Thanks! Actually, accounting for overdispersion is super
>>>>>> >>>>>>>>>> important as it seems, then the zeros can be captured well.
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> On 23.06.2016 11:50, Thierry Onkelinx wrote:
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>>> Dear Philipp,
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> 1. Fit a Poisson model to the data. 2. Simulate a new
>>>>>> response
>>>>>> >>>>>>>>>>> vector for the dataset according to the model. 3. Count
>>>>>> the
>>>>>> >>>>>>>>>>> number of zero's in the simulated response vector. 4.
>>>>>> Repeat
>>>>>> >>>>>>>>>>> step 2 and 3 a decent number of time and plot a histogram
>>>>>> of
>>>>>> >>>>>>>>>>> the number of zero's in the simulation. If the number of
>>>>>> zero's
>>>>>> >>>>>>>>>>> in the original dataset is larger than those in the
>>>>>> >>>>>>>>>>> simulations, then the model can't capture all zero's. In
>>>>>> such
>>>>>> >>>>>>>>>>> case, first try to update the model and repeat the
>>>>>> procedure.
>>>>>> >>>>>>>>>>> If that fails, look for zero-inflated models.
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> Best regards,
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en
>>>>>> bosonderzoek /
>>>>>> >>>>>>>>>>> Research Institute for Nature and Forest team Biometrie &
>>>>>> >>>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>> >>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> To call in the statistician after the experiment is done
>>>>>> may
>>>>>> >>>>>>>>>>> be no more than asking him to perform a post-mortem
>>>>>> >>>>>>>>>>> examination: he may be able to say what the experiment
>>>>>> died of.
>>>>>> >>>>>>>>>>> ~ Sir Ronald Aylmer Fisher The plural of anecdote is not
>>>>>> data.
>>>>>> >>>>>>>>>>> ~ Roger Brinner The combination of some data and an aching
>>>>>> >>>>>>>>>>> desire for an answer does not ensure that a reasonable
>>>>>> answer
>>>>>> >>>>>>>>>>> can be extracted from a given body of data. ~ John Tukey
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> 2016-06-23 11:27 GMT+02:00 Philipp Singer <
>>>>>> <killver op gmail.com>killver op gmail.com
>>>>>> >>>>>>>>>>> <mailto: <killver op gmail.com>killver op gmail.com <mailto:
>>>>>> <killver op gmail.com>killver op gmail.com>>>:
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> Thanks Thierry - That totally makes sense. Is there some
>>>>>> way of
>>>>>> >>>>>>>>>>> formally checking that, except thinking about the setting
>>>>>> and
>>>>>> >>>>>>>>>>> underlying processes?
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> On 23.06.2016 11:04, Thierry Onkelinx wrote:
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> Dear Philipp,
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> Do you have just lots of zero's, or more zero's than the
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> Poisson
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> distribution can explain? Those are two different things.
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> The example
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> below generates data from a Poisson distribution and has
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> 99% zero's
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> but no zero-inflation. The second example has only 1%
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> zero's but is
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> clearly zero-inflated.
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01)
>>>>>> mean(sim
>>>>>> >>>>>>>>>>>> == 0) hist(sim)
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> lambda = 1000)
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> So before looking for zero-inflated models, try to model
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> the zero's.
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> Best regards,
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur- en
>>>>>> bosonderzoek /
>>>>>> >>>>>>>>>>>> Research Institute
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> for Nature
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> and Forest team Biometrie & Kwaliteitszorg / team
>>>>>> Biometrics
>>>>>> >>>>>>>>>>>> & Quality
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> Assurance
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> To call in the statistician after the experiment is done
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> may be no
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> more than asking him to perform a post-mortem
>>>>>> examination:
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> he may be
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> able to say what the experiment died of. ~ Sir Ronald
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> Aylmer Fisher
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> The plural of anecdote is not data. ~ Roger Brinner The
>>>>>> >>>>>>>>>>>> combination of some data and an aching desire for an
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> answer does
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> not ensure that a reasonable answer can be extracted
>>>>>> from a
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> given body
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> of data. ~ John Tukey
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> 2016-06-23 10:07 GMT+02:00 Philipp Singer
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> < <killver op gmail.com>killver op gmail.com <mailto:
>>>>>> <killver op gmail.com>killver op gmail.com>
>>>>>> >>>>>>>>>>> <mailto: <killver op gmail.com>killver op gmail.com <mailto:
>>>>>> <killver op gmail.com>killver op gmail.com>>
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> <mailto: <killver op gmail.com>killver op gmail.com <mailto:
>>>>>> <killver op gmail.com>killver op gmail.com>
>>>>>> >>>>>>>>>>>> <mailto: <killver op gmail.com>killver op gmail.com <mailto:
>>>>>> <killver op gmail.com>killver op gmail.com>>>>:
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> Dear group - I am currently fitting a Poisson glmer
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> where I have
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> an excess of outcomes that are zero (>95%). I am now
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> debating on
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> how to proceed and came up with three options:
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> 1.) Just fit a regular glmer to the complete data. I am
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> not fully
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> sure how interpret the coefficients then, are they more
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> optimizing
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> towards distinguishing zero and non-zero, or also
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> capturing the
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> differences in those outcomes that are non-zero?
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> 2.) Leave all zeros out of the data and fit a glmer to
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> only those
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> outcomes that are non-zero. Then, I would only learn
>>>>>> about
>>>>>> >>>>>>>>>>>> differences in the non-zero outcomes though.
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> 3.) Use a zero-inflated Poisson model. My data is quite
>>>>>> >>>>>>>>>>>> large-scale, so I am currently playing around with the EM
>>>>>> >>>>>>>>>>>> implementation of Bolker et al. that alternates between
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> fitting a
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> glmer with data that are weighted according to their zero
>>>>>> >>>>>>>>>>>> probability, and fitting a logistic regression for the
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> probability
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> that a data point is zero. The method is elaborated for
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> the OWL
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> data in:
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>
>>>>>> <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
>>>>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>>>>> >>>>>>>>>>> <
>>>>>> >>>>>>>>>>>
>>>>>> <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
>>>>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>>>>> >>>>>>>>>>> >
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>> I am not fully sure how to interpret the results for the
>>>>>> >>>>>>>
>>>>>> >>>>>>>> zero-inflated version though. Would I need to interpret the
>>>>>> >>>>>>>>>>>> coefficients for the result of the glmer similar to as
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> I would do
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> for my idea of 2)? And then on top of that interpret the
>>>>>> >>>>>>>>>>>> coefficients for the logistic regression regarding
>>>>>> whether
>>>>>> >>>>>>>>>>>> something is in the perfect or imperfect state? I am
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> also not
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> quite sure what the common approach for the zformula is
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> here. The
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> OWL elaborations only use zformula=z~1, so no random
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> effect; I
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> would use the same formula as for the glmer.
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> I am appreciating some help and pointers.
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> Thanks! Philipp
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> _______________________________________________
>>>>>> >>>>>>>>>>>> <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org
>>>>>> >>>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org>
>>>>>> >>>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org>
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org
>>>>>> >>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org>>
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org
>>>>>> >>>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org>
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org
>>>>>> >>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org>>> mailing list
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>>
>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>> >>>>>>>>>>>> <
>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> _______________________________________________
>>>>>> >>>>>>>>>>> <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org
>>>>>> >>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org>
>>>>>> >>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org>
>>>>>> >>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org
>>>>>> >>>>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org>> mailing list
>>>>>> >>>>>>>>>>>
>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>> >>>>>>>>>>> <
>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> _______________________________________________
>>>>>> >>>>>>>>> <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org
>>>>>> >>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org>
>>>>>> >>>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org> mailing list
>>>>>> >>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>> >>>>>>>>> <
>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>> [[alternative HTML version deleted]]
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> _______________________________________________
>>>>>> >>>>>>>> <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org
>>>>>> >>>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org> mailing list
>>>>>> >>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> _______________________________________________
>>>>>> >>>>>>> <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org
>>>>>> >>>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org> mailing list
>>>>>> >>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>> >>>>>>>
>>>>>> >>>>>> _______________________________________________
>>>>>> >>>>>> <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org
>>>>>> >>>>>> <mailto: <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org> mailing list
>>>>>> >>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>         [[alternative HTML version deleted]]
>>>>>> >>>
>>>>>> >>> _______________________________________________
>>>>>> >>> <R-sig-mixed-models op r-project.org>
>>>>>> R-sig-mixed-models op r-project.org mailing list
>>>>>> >>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>> >>>
>>>>>> >>>
>>>>>> >
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> <R-sig-mixed-models op r-project.org>R-sig-mixed-models op r-project.org
>>>>>> mailing list
>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list