[R-sig-ME] Question about zero-inflated Poisson glmer
Philipp Singer
killver at gmail.com
Tue Jun 28 10:14:04 CEST 2016
You can find a sample of the data here:
https://www.dropbox.com/s/kqqxmc3wp225lug/r_sample.csv.gz?dl=1
You can think of the setting as popularity of items "y" inside stores
"id" explained by two features "a" and "b" whereas "a" is more of a
control covariate and I am interested in whether "b" has a positive impact.
The current baseline formula would be "y~1+b+(1|id)". Extending the
formula does improve the model, but not really my core problem of
strange predictions e.g., "y~1+b+a+(1|id)" or "y~1+b+a+(1|id)+(0+b|id)".
My main goal is to make inference on "a", but I cannot really trust the
significant coefficients due to the model fit.
Thanks a lot for your help again!
Philipp
On 28.06.2016 09:57, Thierry Onkelinx wrote:
> It's hard to tell what's wrong without any knowledge of the model. A
> reproducible example would be handy.
>
> Some things to check:
> - How strong is the zero-inflation according to the model?
> - What are the fitted values exactly? The response? The predict mean
> of the counts? The probability of extre zero's?
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may be
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a given body
> of data. ~ John Tukey
>
> 2016-06-28 9:39 GMT+02:00 Philipp Singer <killver at gmail.com
> <mailto:killver at gmail.com>>:
>
> Unfortunately, if I model the data with a zero-inflated negative
> binomial model (which appears to be the most appropriate model to
> me), the fitted values are never zero, but hover around a mean of
> 20, even though, as said, my data contains around 95% zeros.
>
> I thought about hurdle models as well, but zero-inflated
> definitely fit the process better.
>
> On 27.06.2016 21:59, Thierry Onkelinx wrote:
>> If there is overdispersion, then try a negative binomial model or
>> a zero-inflated negative binomial model. If not try a
>> zero-inflated Poisson. Adding relevant covariates can reduce
>> overdispersion.
>>
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek / Research Institute for
>> Nature and Forest
>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>> Assurance
>> Kliniekstraat 25
>> 1070 Anderlecht
>> Belgium
>>
>> To call in the statistician after the experiment is done may be
>> no more than asking him to perform a post-mortem examination: he
>> may be able to say what the experiment died of. ~ Sir Ronald
>> Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer
>> does not ensure that a reasonable answer can be extracted from a
>> given body of data. ~ John Tukey
>>
>> 2016-06-27 17:46 GMT+02:00 Philipp Singer <killver at gmail.com
>> <mailto:killver at gmail.com>>:
>>
>> Well, as posted beforehand the std dev is 9.5 ... so does not
>> seem too good then :/
>>
>> Any other idea?
>>
>>
>> On 27.06.2016 17:31, Thierry Onkelinx wrote:
>>> Dear Philipp,
>>>
>>> You've been bitten by observation level random effects. I've
>>> put together a document about it on
>>> http://rpubs.com/INBOstats/OLRE. Bottomline you're OKish
>>> when the standard devation of the OLRE smaller than 1.
>>> You're in trouble when it's above 3. In between you need to
>>> check the model carefully.
>>>
>>> Best regards,
>>>
>>> ir. Thierry Onkelinx
>>> Instituut voor natuur- en bosonderzoek / Research Institute
>>> for Nature and Forest
>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>>> Assurance
>>> Kliniekstraat 25
>>> 1070 Anderlecht
>>> Belgium
>>>
>>> To call in the statistician after the experiment is done may
>>> be no more than asking him to perform a post-mortem
>>> examination: he may be able to say what the experiment died
>>> of. ~ Sir Ronald Aylmer Fisher
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> The combination of some data and an aching desire for an
>>> answer does not ensure that a reasonable answer can be
>>> extracted from a given body of data. ~ John Tukey
>>>
>>> 2016-06-27 16:17 GMT+02:00 Philipp Singer <killver at gmail.com
>>> <mailto:killver at gmail.com>>:
>>>
>>> Here is the fitted vs. residual plot for the
>>> observation-level poisson model where the observation
>>> level has been removed as taken from:
>>> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html
>>>
>>> So basically the prediction is always close to zero.
>>>
>>> Note that this is just on a very small sample (1000 data
>>> points).
>>>
>>> If I fit a nbinom2 to this smalle sample, I get
>>> predictions that are always around ~20 (but never zero).
>>> Both plots are attached.
>>>
>>> What I am wondering is whether I can do inference on a
>>> fixed parameter in my model which is my main task of
>>> this study. The effect is similar in the different
>>> models and in general I am only itnerested in whether it
>>> is positive/negative and "significant" which it is.
>>> However, as can be seen, the prediction looks not too
>>> good here.
>>>
>>>
>>>
>>>
>>> 2016-06-27 15:18 GMT+02:00 Philipp Singer
>>> <killver at gmail.com <mailto:killver at gmail.com>>:
>>>
>>> The variance is:
>>>
>>> Conditional model:
>>> Groups Name Variance Std.Dev.
>>> obs (Intercept) 8.991e+01 9.4823139
>>>
>>>
>>>
>>> 2016-06-27 15:06 GMT+02:00 Thierry Onkelinx
>>> <thierry.onkelinx at inbo.be
>>> <mailto:thierry.onkelinx at inbo.be>>:
>>>
>>> Dear Philipp,
>>>
>>> How strong is the variance of the observation
>>> level random effect? I would trust a model with
>>> large OLRE variance.
>>>
>>> Best regards,
>>>
>>> Thierry
>>>
>>> ir. Thierry Onkelinx
>>> Instituut voor natuur- en bosonderzoek /
>>> Research Institute for Nature and Forest
>>> team Biometrie & Kwaliteitszorg / team
>>> Biometrics & Quality Assurance
>>> Kliniekstraat 25
>>> 1070 Anderlecht
>>> Belgium
>>>
>>> To call in the statistician after the experiment
>>> is done may be no more than asking him to
>>> perform a post-mortem examination: he may be
>>> able to say what the experiment died of. ~ Sir
>>> Ronald Aylmer Fisher
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> The combination of some data and an aching
>>> desire for an answer does not ensure that a
>>> reasonable answer can be extracted from a given
>>> body of data. ~ John Tukey
>>>
>>> 2016-06-27 14:59 GMT+02:00 Philipp Singer
>>> <killver at gmail.com <mailto:killver at gmail.com>>:
>>>
>>> I have now played around more with the data
>>> an the models both using lme4
>>> and glmmTMB.
>>>
>>> I can report the following:
>>>
>>> Modeling the data with a zero-inflated
>>> Poisson improves the model
>>> significantly. However, when calling predict
>>> and simulating rpoissons, I
>>> end up with nearly no values that are zero
>>> (in the original data there are
>>> 96% zero).
>>>
>>> When I model the data with overdisperion by
>>> including an observation-level
>>> random effect, I can also improve the model
>>> (not surprisingly due to the
>>> random effect). When I predict outcomes by
>>> ignoring the observation-level
>>> random effect (in lme4), I receive bad
>>> prediction if I compare it to the
>>> original data. While many zeros can be
>>> captured (of course), the positive
>>> outcomes can not be captured well.
>>>
>>> Combining zero inflation and overdispersion
>>> further improves the model, but
>>> I can only do that with glmmTMB and then
>>> have troubles doing predictions
>>> ignoring the observation-level random effect.
>>>
>>> Another side question:
>>>
>>> In lme4, when I do:
>>>
>>> m = glm(x~1,family="poisson")
>>> rpois(n=len(data),lambda=predict(m,
>>> type='response',re.form=NA)
>>>
>>> vs.
>>>
>>> simulate(1,m,re.form=NA)
>>>
>>> I receive different outcomes? Do I
>>> understand these function wrongly?
>>>
>>> Would appreciate some more help/pointers!
>>>
>>> Thanks,
>>> Philipp
>>>
>>> 2016-06-24 15:52 GMT+02:00 Philipp Singer
>>> <killver at gmail.com <mailto:killver at gmail.com>>:
>>>
>>> > Thanks - I started an issue there to
>>> answer some of my questions.
>>> >
>>> > Regarding the installation: I was trying
>>> to somehow do it in anaconda with
>>> > a specific R kernel and had some issues. I
>>> am trying to resort that with
>>> > the anaconda guys though, if I have a
>>> tutorial on how to properly setup
>>> > glmmTMB in anaconda, I will let you know.
>>> The install worked fine in my
>>> > standard R environment.
>>> >
>>> >
>>> > On 24.06.2016 15 <tel:24.06.2016%2015>:40,
>>> Ben Bolker wrote:
>>> >
>>> >> Probably for now the glmmTMB issues page
>>> is best.
>>> >>
>>> >> When you go there:
>>> >>
>>> >> - details on installation
>>> problems/hiccups would be useful
>>> >> - a reproducible example for the
>>> problem listed below would be useful
>>> >> - dispformula is for allowing
>>> dispersion/residual variance to vary
>>> >> with covariates (i.e., modeling
>>> heteroscedasticity)
>>> >>
>>> >> cheers
>>> >> Ben Bolker
>>> >>
>>> >>
>>> >> On 16-06-24 09:13 AM, Philipp Singer wrote:
>>> >>
>>> >>> Update, I tried it like that, but
>>> receive an error message.
>>> >>>
>>> >>> Warning message:
>>> >>> In nlminb(start = par, objective = fn,
>>> gradient = gr): NA/NaN function
>>> >>> evaluation
>>> >>>
>>> >>> Error in solve.default(hessian.fixed):
>>> Lapack routine dgesv: system is
>>> >>> exactly singular: U[3,3] = 0
>>> >>> Traceback:
>>> >>>
>>> >>> 1. glmmTMB(y ~ 1 + x + (1 | b),
>>> >>> . data = data, family = "poisson",
>>> dispformula = ~1 + x)
>>> >>> 2. sdreport(obj)
>>> >>> 3. solve(hessian.fixed)
>>> >>> 4. solve(hessian.fixed)
>>> >>> 5. solve.default(hessian.fixed)
>>> >>>
>>> >>> Any ideas on that?
>>> >>>
>>> >>> BTW: Is it fine to post glmmTMB
>>> questions here, or should I rather use
>>> >>> the github issue page, or is there maybe
>>> a dedicated mailing list?
>>> >>>
>>> >>> Thanks,
>>> >>> Philipp
>>> >>>
>>> >>> On 24.06.2016 14:35, Philipp Singer wrote:
>>> >>>
>>> >>>> It indeed seems to run quite fast; had
>>> some trouble installing, but
>>> >>>> works now on my 3.3 R setup.
>>> >>>>
>>> >>>> One question I have is regarding the
>>> specification of dispersion as I
>>> >>>> need to specify the dispformula. What
>>> is the difference here between
>>> >>>> just specifying fixed effects vs. also
>>> the random effects?
>>> >>>>
>>> >>>> On 23.06.2016 23:07, Mollie Brooks wrote:
>>> >>>>
>>> >>>>> glmmTMB does crossed RE. Ben did some
>>> timings in vignette("glmmTMB")
>>> >>>>> and it was 2.3 times faster than glmer
>>> for one simple GLMM.
>>> >>>>>
>>> >>>>>
>>> >>>>> On 23Jun 2016, at 10:44, Philipp
>>> Singer <killver at gmail.com
>>> <mailto:killver at gmail.com>> wrote:
>>> >>>>>>
>>> >>>>>> Did try glmmADMB but unfortunately it
>>> is way too slow for my data.
>>> >>>>>>
>>> >>>>>> Did not know about glmmTMB, will try
>>> it out. Does it work with
>>> >>>>>> crossed random effects and how does
>>> it scale with more data? I will
>>> >>>>>> check the docu and try it though.
>>> Thanks for the info.
>>> >>>>>>
>>> >>>>>> On 23.06.2016 19:14, Ben Bolker wrote:
>>> >>>>>>
>>> >>>>>>> I would also comment that glmmTMB
>>> is likely to be much faster
>>> >>>>>>> than the
>>> >>>>>>> lme4-based EM approach ...
>>> >>>>>>>
>>> >>>>>>> cheers
>>> >>>>>>> Ben B.
>>> >>>>>>>
>>> >>>>>>> On 16-06-23 12:47 PM, Mollie Brooks
>>> wrote:
>>> >>>>>>>
>>> >>>>>>>> Hi Philipp,
>>> >>>>>>>>
>>> >>>>>>>> You could also try fitting the
>>> model with and without ZI using
>>> >>>>>>>> either
>>> >>>>>>>> glmmADMB or glmmTMB. Then compare
>>> the AICs. I believe model
>>> >>>>>>>> selection
>>> >>>>>>>> is useful for this, but I could be
>>> missing something since the
>>> >>>>>>>> simulation procedure that Thierry
>>> described seems to recommended
>>> >>>>>>>> more
>>> >>>>>>>> often.
>>> >>>>>>>>
>>> >>>>>>>> https://github.com/glmmTMB/glmmTMB
>>> >>>>>>>> http://glmmadmb.r-forge.r-project.org
>>> >>>>>>>>
>>> >>>>>>>> glmmTMB is still in the development
>>> phase, but we’ve done a lot of
>>> >>>>>>>> testing.
>>> >>>>>>>>
>>> >>>>>>>> cheers, Mollie
>>> >>>>>>>>
>>> >>>>>>>> ------------------------ Mollie
>>> Brooks, PhD Postdoctoral Researcher,
>>> >>>>>>>> Population Ecology Research Group
>>> Department of Evolutionary Biology
>>> >>>>>>>> & Environmental Studies, University
>>> of Zürich
>>> >>>>>>>>
>>> http://www.popecol.org/team/mollie-brooks/
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> On 23Jun 2016, at 8:22, Philipp
>>> Singer <killver at gmail.com
>>> <mailto:killver at gmail.com>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> Thanks, great information, that is
>>> really helpful.
>>> >>>>>>>>>
>>> >>>>>>>>> I agree that those are different
>>> things, however when using a
>>> >>>>>>>>> random effect for overdispersion,
>>> I can simulate the same number of
>>> >>>>>>>>> zero outcomes (~95%).
>>> >>>>>>>>>
>>> >>>>>>>>> On 23.06.2016 15:50, Thierry
>>> Onkelinx wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>> Be careful when using
>>> overdispersion to model zero-inflation.
>>> >>>>>>>>>> Those are two different things.
>>> >>>>>>>>>>
>>> >>>>>>>>>> I've put some information together in
>>> >>>>>>>>>>
>>> http://rpubs.com/INBOstats/zeroinflation
>>> >>>>>>>>>>
>>> >>>>>>>>>> ir. Thierry Onkelinx Instituut
>>> voor natuur- en bosonderzoek /
>>> >>>>>>>>>> Research Institute for Nature and
>>> Forest team Biometrie &
>>> >>>>>>>>>> Kwaliteitszorg / team Biometrics
>>> & Quality Assurance
>>> >>>>>>>>>> Kliniekstraat 25 1070 Anderlecht
>>> Belgium
>>> >>>>>>>>>>
>>> >>>>>>>>>> To call in the statistician after
>>> the experiment is done may be
>>> >>>>>>>>>> no more than asking him to
>>> perform a post-mortem examination: he
>>> >>>>>>>>>> may be able to say what the
>>> experiment died of. ~ Sir Ronald
>>> >>>>>>>>>> Aylmer Fisher The plural of
>>> anecdote is not data. ~ Roger
>>> >>>>>>>>>> Brinner The combination of some
>>> data and an aching desire for an
>>> >>>>>>>>>> answer does not ensure that a
>>> reasonable answer can be extracted
>>> >>>>>>>>>> from a given body of data. ~ John
>>> Tukey
>>> >>>>>>>>>>
>>> >>>>>>>>>> 2016-06-23 12:42 GMT+02:00
>>> Philipp Singer <killver at gmail.com
>>> <mailto:killver at gmail.com>
>>> >>>>>>>>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>
>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>>>>:
>>> >>>>>>>>>>
>>> >>>>>>>>>> Thanks! Actually, accounting for
>>> overdispersion is super
>>> >>>>>>>>>> important as it seems, then the
>>> zeros can be captured well.
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> On 23.06.2016 11:50, Thierry
>>> Onkelinx wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> Dear Philipp,
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> 1. Fit a Poisson model to the
>>> data. 2. Simulate a new response
>>> >>>>>>>>>>> vector for the dataset according
>>> to the model. 3. Count the
>>> >>>>>>>>>>> number of zero's in the
>>> simulated response vector. 4. Repeat
>>> >>>>>>>>>>> step 2 and 3 a decent number of
>>> time and plot a histogram of
>>> >>>>>>>>>>> the number of zero's in the
>>> simulation. If the number of zero's
>>> >>>>>>>>>>> in the original dataset is
>>> larger than those in the
>>> >>>>>>>>>>> simulations, then the model
>>> can't capture all zero's. In such
>>> >>>>>>>>>>> case, first try to update the
>>> model and repeat the procedure.
>>> >>>>>>>>>>> If that fails, look for
>>> zero-inflated models.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Best regards,
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> ir. Thierry Onkelinx Instituut
>>> voor natuur- en bosonderzoek /
>>> >>>>>>>>>>> Research Institute for Nature
>>> and Forest team Biometrie &
>>> >>>>>>>>>>> Kwaliteitszorg / team Biometrics
>>> & Quality Assurance
>>> >>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht
>>> Belgium
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> To call in the statistician
>>> after the experiment is done may
>>> >>>>>>>>>>> be no more than asking him to
>>> perform a post-mortem
>>> >>>>>>>>>>> examination: he may be able to
>>> say what the experiment died of.
>>> >>>>>>>>>>> ~ Sir Ronald Aylmer Fisher The
>>> plural of anecdote is not data.
>>> >>>>>>>>>>> ~ Roger Brinner The combination
>>> of some data and an aching
>>> >>>>>>>>>>> desire for an answer does not
>>> ensure that a reasonable answer
>>> >>>>>>>>>>> can be extracted from a given
>>> body of data. ~ John Tukey
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> 2016-06-23 11:27 GMT+02:00
>>> Philipp Singer <killver at gmail.com
>>> <mailto:killver at gmail.com>
>>> >>>>>>>>>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>
>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>>>>:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Thanks Thierry - That totally
>>> makes sense. Is there some way of
>>> >>>>>>>>>>> formally checking that, except
>>> thinking about the setting and
>>> >>>>>>>>>>> underlying processes?
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On 23.06.2016 11:04, Thierry
>>> Onkelinx wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> Dear Philipp,
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Do you have just lots of
>>> zero's, or more zero's than the
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> Poisson
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> distribution can explain? Those
>>> are two different things.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> The example
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> below generates data from a
>>> Poisson distribution and has
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> 99% zero's
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> but no zero-inflation. The
>>> second example has only 1%
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> zero's but is
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> clearly zero-inflated.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> set.seed(1) n <- 1e8 sim <-
>>> rpois(n, lambda = 0.01) mean(sim
>>> >>>>>>>>>>>> == 0) hist(sim)
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> sim.infl <- rbinom(n, size = 1,
>>> prob = 0.99) * rpois(n,
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> lambda = 1000)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> So before looking for
>>> zero-inflated models, try to model
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> the zero's.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> Best regards,
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> ir. Thierry Onkelinx Instituut
>>> voor natuur- en bosonderzoek /
>>> >>>>>>>>>>>> Research Institute
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> for Nature
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> and Forest team Biometrie &
>>> Kwaliteitszorg / team Biometrics
>>> >>>>>>>>>>>> & Quality
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> Assurance
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> Kliniekstraat 25 1070
>>> Anderlecht Belgium
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> To call in the statistician
>>> after the experiment is done
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> may be no
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> more than asking him to perform
>>> a post-mortem examination:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> he may be
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> able to say what the experiment
>>> died of. ~ Sir Ronald
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> Aylmer Fisher
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> The plural of anecdote is not
>>> data. ~ Roger Brinner The
>>> >>>>>>>>>>>> combination of some data and an
>>> aching desire for an
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> answer does
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> not ensure that a reasonable
>>> answer can be extracted from a
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> given body
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> of data. ~ John Tukey
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> 2016-06-23 10:07 GMT+02:00
>>> Philipp Singer
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> <killver at gmail.com
>>> <mailto:killver at gmail.com>
>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>>
>>> >>>>>>>>>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>
>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>
>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>>
>>> >>>>>>>>>>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>
>>> <mailto:killver at gmail.com
>>> <mailto:killver at gmail.com>>>>>:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Dear group - I am currently
>>> fitting a Poisson glmer
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> where I have
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> an excess of outcomes that are
>>> zero (>95%). I am now
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> debating on
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> how to proceed and came up with
>>> three options:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> 1.) Just fit a regular glmer to
>>> the complete data. I am
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> not fully
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> sure how interpret the
>>> coefficients then, are they more
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> optimizing
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> towards distinguishing zero and
>>> non-zero, or also
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> capturing the
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> differences in those outcomes
>>> that are non-zero?
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> 2.) Leave all zeros out of the
>>> data and fit a glmer to
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> only those
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> outcomes that are non-zero.
>>> Then, I would only learn about
>>> >>>>>>>>>>>> differences in the non-zero
>>> outcomes though.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> 3.) Use a zero-inflated Poisson
>>> model. My data is quite
>>> >>>>>>>>>>>> large-scale, so I am currently
>>> playing around with the EM
>>> >>>>>>>>>>>> implementation of Bolker et al.
>>> that alternates between
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> fitting a
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> glmer with data that are
>>> weighted according to their zero
>>> >>>>>>>>>>>> probability, and fitting a
>>> logistic regression for the
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> probability
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> that a data point is zero. The
>>> method is elaborated for
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> the OWL
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> data in:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>> >>>>>>>>>>> <
>>> >>>>>>>>>>>
>>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>> >>>>>>>>>>> >
>>> >>>>>>>>>>>
>>> >>>>>>>>>> I am not fully sure how to
>>> interpret the results for the
>>> >>>>>>>
>>> >>>>>>>> zero-inflated version though. Would
>>> I need to interpret the
>>> >>>>>>>>>>>> coefficients for the result of
>>> the glmer similar to as
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> I would do
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> for my idea of 2)? And then on
>>> top of that interpret the
>>> >>>>>>>>>>>> coefficients for the logistic
>>> regression regarding whether
>>> >>>>>>>>>>>> something is in the perfect or
>>> imperfect state? I am
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> also not
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> quite sure what the common
>>> approach for the zformula is
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> here. The
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> OWL elaborations only use
>>> zformula=z~1, so no random
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>> effect; I
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> would use the same formula as
>>> for the glmer.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I am appreciating some help and
>>> pointers.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Thanks! Philipp
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> _______________________________________________
>>> >>>>>>>>>>>>
>>> R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> >>>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>
>>> >>>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> >>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> >>>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> >>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>>>
>>> mailing list
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> >>>>>>>>>>>>
>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> [[alternative HTML version
>>> deleted]]
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> _______________________________________________
>>> >>>>>>>>>>> R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> >>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>
>>> >>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>
>>> >>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> >>>>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>>
>>> mailing list
>>> >>>>>>>>>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> >>>>>>>>>>>
>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> [[alternative HTML version deleted]]
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> _______________________________________________
>>> >>>>>>>>> R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> >>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>
>>> >>>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>
>>> mailing list
>>> >>>>>>>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> >>>>>>>>>
>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>> >>>>>>>>>
>>> >>>>>>>> [[alternative HTML version deleted]]
>>> >>>>>>>>
>>> >>>>>>>>
>>> _______________________________________________
>>> >>>>>>>> R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> >>>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>
>>> mailing list
>>> >>>>>>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> >>>>>>>>
>>> >>>>>>>>
>>> _______________________________________________
>>> >>>>>>> R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> >>>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>
>>> mailing list
>>> >>>>>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> >>>>>>>
>>> >>>>>>
>>> _______________________________________________
>>> >>>>>> R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> >>>>>>
>>> <mailto:R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>>
>>> mailing list
>>> >>>>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> >>>>>>
>>> >>>>>
>>> >>> [[alternative HTML version deleted]]
>>> >>>
>>> >>>
>>> _______________________________________________
>>> >>> R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> mailing list
>>> >>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> >>>
>>> >>>
>>> >
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org
>>> <mailto:R-sig-mixed-models at r-project.org>
>>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list