[R-sig-ME] Question about zero-inflated Poisson glmer

Tue Jun 28 10:14:04 CEST 2016

You can find a sample of the data here:
https://www.dropbox.com/s/kqqxmc3wp225lug/r_sample.csv.gz?dl=1

You can think of the setting as popularity of items "y" inside stores 
"id" explained by two features "a" and "b" whereas "a" is more of a 
control covariate and I am interested in whether "b" has a positive impact.

The current baseline formula would be "y~1+b+(1|id)". Extending the 
formula does improve the model, but not really my core problem of 
strange predictions e.g., "y~1+b+a+(1|id)" or "y~1+b+a+(1|id)+(0+b|id)". 
My main goal is to make inference on "a", but I cannot really trust the 
significant coefficients due to the model fit.

Thanks a lot for your help again!
Philipp

On 28.06.2016 09:57, Thierry Onkelinx wrote:
> It's hard to tell what's wrong without any knowledge of the model. A 
> reproducible example would be handy.
>
> Some things to check:
> - How strong is the zero-inflation according to the model?
> - What are the fitted values exactly? The response? The predict mean 
> of the counts? The probability of extre zero's?
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature 
> and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no 
> more than asking him to perform a post-mortem examination: he may be 
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does 
> not ensure that a reasonable answer can be extracted from a given body 
> of data. ~ John Tukey
>
> 2016-06-28 9:39 GMT+02:00 Philipp Singer <killver at gmail.com 
> <mailto:killver at gmail.com>>:
>
>     Unfortunately, if I model the data with a zero-inflated negative
>     binomial model (which appears to be the most appropriate model to
>     me), the fitted values are never zero, but hover around a mean of
>     20, even though, as said, my data contains around 95% zeros.
>
>     I thought about hurdle models as well, but zero-inflated
>     definitely fit the process better.
>
>     On 27.06.2016 21:59, Thierry Onkelinx wrote:
>>     If there is overdispersion, then try a negative binomial model or
>>     a zero-inflated negative binomial model. If not try a
>>     zero-inflated Poisson. Adding relevant covariates can reduce
>>     overdispersion.
>>
>>     ir. Thierry Onkelinx
>>     Instituut voor natuur- en bosonderzoek / Research Institute for
>>     Nature and Forest
>>     team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>>     Assurance
>>     Kliniekstraat 25
>>     1070 Anderlecht
>>     Belgium
>>
>>     To call in the statistician after the experiment is done may be
>>     no more than asking him to perform a post-mortem examination: he
>>     may be able to say what the experiment died of. ~ Sir Ronald
>>     Aylmer Fisher
>>     The plural of anecdote is not data. ~ Roger Brinner
>>     The combination of some data and an aching desire for an answer
>>     does not ensure that a reasonable answer can be extracted from a
>>     given body of data. ~ John Tukey
>>
>>     2016-06-27 17:46 GMT+02:00 Philipp Singer <killver at gmail.com
>>     <mailto:killver at gmail.com>>:
>>
>>         Well, as posted beforehand the std dev is 9.5 ... so does not
>>         seem too good then :/
>>
>>         Any other idea?
>>
>>
>>         On 27.06.2016 17:31, Thierry Onkelinx wrote:
>>>         Dear Philipp,
>>>
>>>         You've been bitten by observation level random effects. I've
>>>         put together a document about it on
>>>         http://rpubs.com/INBOstats/OLRE. Bottomline you're OKish
>>>         when the standard devation of the OLRE smaller than 1.
>>>         You're in trouble when it's above 3. In between you need to
>>>         check the model carefully.
>>>
>>>         Best regards,
>>>
>>>         ir. Thierry Onkelinx
>>>         Instituut voor natuur- en bosonderzoek / Research Institute
>>>         for Nature and Forest
>>>         team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>>>         Assurance
>>>         Kliniekstraat 25
>>>         1070 Anderlecht
>>>         Belgium
>>>
>>>         To call in the statistician after the experiment is done may
>>>         be no more than asking him to perform a post-mortem
>>>         examination: he may be able to say what the experiment died
>>>         of. ~ Sir Ronald Aylmer Fisher
>>>         The plural of anecdote is not data. ~ Roger Brinner
>>>         The combination of some data and an aching desire for an
>>>         answer does not ensure that a reasonable answer can be
>>>         extracted from a given body of data. ~ John Tukey
>>>
>>>         2016-06-27 16:17 GMT+02:00 Philipp Singer <killver at gmail.com
>>>         <mailto:killver at gmail.com>>:
>>>
>>>             Here is the fitted vs. residual plot for the
>>>             observation-level poisson model where the observation
>>>             level has been removed as taken from:
>>>             https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html
>>>
>>>             So basically the prediction is always close to zero.
>>>
>>>             Note that this is just on a very small sample (1000 data
>>>             points).
>>>
>>>             If I fit a nbinom2 to this smalle sample, I get
>>>             predictions that are always around ~20 (but never zero).
>>>             Both plots are attached.
>>>
>>>             What I am wondering is whether I can do inference on a
>>>             fixed parameter in my model which is my main task of
>>>             this study. The effect is similar in the different
>>>             models and in general I am only itnerested in whether it
>>>             is positive/negative and "significant" which it is.
>>>             However, as can be seen, the prediction looks not too
>>>             good here.
>>>
>>>
>>>
>>>
>>>             2016-06-27 15:18 GMT+02:00 Philipp Singer
>>>             <killver at gmail.com <mailto:killver at gmail.com>>:
>>>
>>>                 The variance is:
>>>
>>>                 Conditional model:
>>>                   Groups            Name        Variance  Std.Dev.
>>>                   obs               (Intercept) 8.991e+01 9.4823139
>>>
>>>
>>>
>>>                 2016-06-27 15:06 GMT+02:00 Thierry Onkelinx
>>>                 <thierry.onkelinx at inbo.be
>>>                 <mailto:thierry.onkelinx at inbo.be>>:
>>>
>>>                     Dear Philipp,
>>>
>>>                     How strong is the variance of the observation
>>>                     level random effect? I would trust a model with
>>>                     large OLRE variance.
>>>
>>>                     Best regards,
>>>
>>>                     Thierry
>>>
>>>                     ir. Thierry Onkelinx
>>>                     Instituut voor natuur- en bosonderzoek /
>>>                     Research Institute for Nature and Forest
>>>                     team Biometrie & Kwaliteitszorg / team
>>>                     Biometrics & Quality Assurance
>>>                     Kliniekstraat 25
>>>                     1070 Anderlecht
>>>                     Belgium
>>>
>>>                     To call in the statistician after the experiment
>>>                     is done may be no more than asking him to
>>>                     perform a post-mortem examination: he may be
>>>                     able to say what the experiment died of. ~ Sir
>>>                     Ronald Aylmer Fisher
>>>                     The plural of anecdote is not data. ~ Roger Brinner
>>>                     The combination of some data and an aching
>>>                     desire for an answer does not ensure that a
>>>                     reasonable answer can be extracted from a given
>>>                     body of data. ~ John Tukey
>>>
>>>                     2016-06-27 14:59 GMT+02:00 Philipp Singer
>>>                     <killver at gmail.com <mailto:killver at gmail.com>>:
>>>
>>>                         I have now played around more with the data
>>>                         an the models both using lme4
>>>                         and glmmTMB.
>>>
>>>                         I can report the following:
>>>
>>>                         Modeling the data with a zero-inflated
>>>                         Poisson improves the model
>>>                         significantly. However, when calling predict
>>>                         and simulating rpoissons, I
>>>                         end up with nearly no values that are zero
>>>                         (in the original data there are
>>>                         96% zero).
>>>
>>>                         When I model the data with overdisperion by
>>>                         including an observation-level
>>>                         random effect, I can also improve the model
>>>                         (not surprisingly due to the
>>>                         random effect). When I predict outcomes by
>>>                         ignoring the observation-level
>>>                         random effect (in lme4), I receive bad
>>>                         prediction if I compare it to the
>>>                         original data. While many zeros can be
>>>                         captured (of course), the positive
>>>                         outcomes can not be captured well.
>>>
>>>                         Combining zero inflation and overdispersion
>>>                         further improves the model, but
>>>                         I can only do that with glmmTMB and then
>>>                         have troubles doing predictions
>>>                         ignoring the observation-level random effect.
>>>
>>>                         Another side question:
>>>
>>>                         In lme4, when I do:
>>>
>>>                         m = glm(x~1,family="poisson")
>>>                         rpois(n=len(data),lambda=predict(m,
>>>                         type='response',re.form=NA)
>>>
>>>                         vs.
>>>
>>>                         simulate(1,m,re.form=NA)
>>>
>>>                         I receive different outcomes? Do I
>>>                         understand these function wrongly?
>>>
>>>                         Would appreciate some more help/pointers!
>>>
>>>                         Thanks,
>>>                         Philipp
>>>
>>>                         2016-06-24 15:52 GMT+02:00 Philipp Singer
>>>                         <killver at gmail.com <mailto:killver at gmail.com>>:
>>>
>>>                         > Thanks - I started an issue there to
>>>                         answer some of my questions.
>>>                         >
>>>                         > Regarding the installation: I was trying
>>>                         to somehow do it in anaconda with
>>>                         > a specific R kernel and had some issues. I
>>>                         am trying to resort that with
>>>                         > the anaconda guys though, if I have a
>>>                         tutorial on how to properly setup
>>>                         > glmmTMB in anaconda, I will let you know.
>>>                         The install worked fine in my
>>>                         > standard R environment.
>>>                         >
>>>                         >
>>>                         > On 24.06.2016 15 <tel:24.06.2016%2015>:40,
>>>                         Ben Bolker wrote:
>>>                         >
>>>                         >>  Probably for now the glmmTMB issues page
>>>                         is best.
>>>                         >>
>>>                         >>  When you go there:
>>>                         >>
>>>                         >>    - details on installation
>>>                         problems/hiccups would be useful
>>>                         >>    - a reproducible example for the
>>>                         problem listed below would be useful
>>>                         >>    - dispformula is for allowing
>>>                         dispersion/residual variance to vary
>>>                         >> with covariates (i.e., modeling
>>>                         heteroscedasticity)
>>>                         >>
>>>                         >> cheers
>>>                         >> Ben Bolker
>>>                         >>
>>>                         >>
>>>                         >> On 16-06-24 09:13 AM, Philipp Singer wrote:
>>>                         >>
>>>                         >>> Update, I tried it like that, but
>>>                         receive an error message.
>>>                         >>>
>>>                         >>> Warning message:
>>>                         >>> In nlminb(start = par, objective = fn,
>>>                         gradient = gr): NA/NaN function
>>>                         >>> evaluation
>>>                         >>>
>>>                         >>> Error in solve.default(hessian.fixed):
>>>                         Lapack routine dgesv: system is
>>>                         >>> exactly singular: U[3,3] = 0
>>>                         >>> Traceback:
>>>                         >>>
>>>                         >>> 1. glmmTMB(y ~ 1 + x + (1 | b),
>>>                         >>>   .     data = data, family = "poisson",
>>>                         dispformula = ~1 + x)
>>>                         >>> 2. sdreport(obj)
>>>                         >>> 3. solve(hessian.fixed)
>>>                         >>> 4. solve(hessian.fixed)
>>>                         >>> 5. solve.default(hessian.fixed)
>>>                         >>>
>>>                         >>> Any ideas on that?
>>>                         >>>
>>>                         >>> BTW: Is it fine to post glmmTMB
>>>                         questions here, or should I rather use
>>>                         >>> the github issue page, or is there maybe
>>>                         a dedicated mailing list?
>>>                         >>>
>>>                         >>> Thanks,
>>>                         >>> Philipp
>>>                         >>>
>>>                         >>> On 24.06.2016 14:35, Philipp Singer wrote:
>>>                         >>>
>>>                         >>>> It indeed seems to run quite fast; had
>>>                         some trouble installing, but
>>>                         >>>> works now on my 3.3 R setup.
>>>                         >>>>
>>>                         >>>> One question I have is regarding the
>>>                         specification of dispersion as I
>>>                         >>>> need to specify the dispformula. What
>>>                         is the difference here between
>>>                         >>>> just specifying fixed effects vs. also
>>>                         the random effects?
>>>                         >>>>
>>>                         >>>> On 23.06.2016 23:07, Mollie Brooks wrote:
>>>                         >>>>
>>>                         >>>>> glmmTMB does crossed RE. Ben did some
>>>                         timings in vignette("glmmTMB")
>>>                         >>>>> and it was 2.3 times faster than glmer
>>>                         for one simple GLMM.
>>>                         >>>>>
>>>                         >>>>>
>>>                         >>>>> On 23Jun 2016, at 10:44, Philipp
>>>                         Singer <killver at gmail.com
>>>                         <mailto:killver at gmail.com>> wrote:
>>>                         >>>>>>
>>>                         >>>>>> Did try glmmADMB but unfortunately it
>>>                         is way too slow for my data.
>>>                         >>>>>>
>>>                         >>>>>> Did not know about glmmTMB, will try
>>>                         it out. Does it work with
>>>                         >>>>>> crossed random effects and how does
>>>                         it scale with more data? I will
>>>                         >>>>>> check the docu and try it though.
>>>                         Thanks for the info.
>>>                         >>>>>>
>>>                         >>>>>> On 23.06.2016 19:14, Ben Bolker wrote:
>>>                         >>>>>>
>>>                         >>>>>>>   I would also comment that glmmTMB
>>>                         is likely to be much faster
>>>                         >>>>>>> than the
>>>                         >>>>>>> lme4-based EM approach ...
>>>                         >>>>>>>
>>>                         >>>>>>>   cheers
>>>                         >>>>>>>     Ben B.
>>>                         >>>>>>>
>>>                         >>>>>>> On 16-06-23 12:47 PM, Mollie Brooks
>>>                         wrote:
>>>                         >>>>>>>
>>>                         >>>>>>>> Hi Philipp,
>>>                         >>>>>>>>
>>>                         >>>>>>>> You could also try fitting the
>>>                         model with and without ZI using
>>>                         >>>>>>>> either
>>>                         >>>>>>>> glmmADMB or glmmTMB. Then compare
>>>                         the AICs. I believe model
>>>                         >>>>>>>> selection
>>>                         >>>>>>>> is useful for this, but I could be
>>>                         missing something since the
>>>                         >>>>>>>> simulation procedure that Thierry
>>>                         described seems to recommended
>>>                         >>>>>>>> more
>>>                         >>>>>>>> often.
>>>                         >>>>>>>>
>>>                         >>>>>>>> https://github.com/glmmTMB/glmmTMB
>>>                         >>>>>>>> http://glmmadmb.r-forge.r-project.org
>>>                         >>>>>>>>
>>>                         >>>>>>>> glmmTMB is still in the development
>>>                         phase, but we’ve done a lot of
>>>                         >>>>>>>> testing.
>>>                         >>>>>>>>
>>>                         >>>>>>>> cheers, Mollie
>>>                         >>>>>>>>
>>>                         >>>>>>>> ------------------------ Mollie
>>>                         Brooks, PhD Postdoctoral Researcher,
>>>                         >>>>>>>> Population Ecology Research Group
>>>                         Department of Evolutionary Biology
>>>                         >>>>>>>> & Environmental Studies, University
>>>                         of Zürich
>>>                         >>>>>>>>
>>>                         http://www.popecol.org/team/mollie-brooks/
>>>                         >>>>>>>>
>>>                         >>>>>>>>
>>>                         >>>>>>>> On 23Jun 2016, at 8:22, Philipp
>>>                         Singer <killver at gmail.com
>>>                         <mailto:killver at gmail.com>> wrote:
>>>                         >>>>>>>>>
>>>                         >>>>>>>>> Thanks, great information, that is
>>>                         really helpful.
>>>                         >>>>>>>>>
>>>                         >>>>>>>>> I agree that those are different
>>>                         things, however when using a
>>>                         >>>>>>>>> random effect for overdispersion,
>>>                         I can simulate the same number of
>>>                         >>>>>>>>> zero outcomes (~95%).
>>>                         >>>>>>>>>
>>>                         >>>>>>>>> On 23.06.2016 15:50, Thierry
>>>                         Onkelinx wrote:
>>>                         >>>>>>>>>
>>>                         >>>>>>>>>> Be careful when using
>>>                         overdispersion to model zero-inflation.
>>>                         >>>>>>>>>> Those are two different things.
>>>                         >>>>>>>>>>
>>>                         >>>>>>>>>> I've put some information together in
>>>                         >>>>>>>>>>
>>>                         http://rpubs.com/INBOstats/zeroinflation
>>>                         >>>>>>>>>>
>>>                         >>>>>>>>>> ir. Thierry Onkelinx Instituut
>>>                         voor natuur- en bosonderzoek /
>>>                         >>>>>>>>>> Research Institute for Nature and
>>>                         Forest team Biometrie &
>>>                         >>>>>>>>>> Kwaliteitszorg / team Biometrics
>>>                         & Quality Assurance
>>>                         >>>>>>>>>> Kliniekstraat 25 1070 Anderlecht
>>>                         Belgium
>>>                         >>>>>>>>>>
>>>                         >>>>>>>>>> To call in the statistician after
>>>                         the experiment is done may be
>>>                         >>>>>>>>>> no more than asking him to
>>>                         perform a post-mortem examination: he
>>>                         >>>>>>>>>> may be able to say what the
>>>                         experiment died of. ~ Sir Ronald
>>>                         >>>>>>>>>> Aylmer Fisher The plural of
>>>                         anecdote is not data. ~ Roger
>>>                         >>>>>>>>>> Brinner The combination of some
>>>                         data and an aching desire for an
>>>                         >>>>>>>>>> answer does not ensure that a
>>>                         reasonable answer can be extracted
>>>                         >>>>>>>>>> from a given body of data. ~ John
>>>                         Tukey
>>>                         >>>>>>>>>>
>>>                         >>>>>>>>>> 2016-06-23 12:42 GMT+02:00
>>>                         Philipp Singer <killver at gmail.com
>>>                         <mailto:killver at gmail.com>
>>>                         >>>>>>>>>> <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>
>>>                         <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>>>>:
>>>                         >>>>>>>>>>
>>>                         >>>>>>>>>> Thanks! Actually, accounting for
>>>                         overdispersion is super
>>>                         >>>>>>>>>> important as it seems, then the
>>>                         zeros can be captured well.
>>>                         >>>>>>>>>>
>>>                         >>>>>>>>>>
>>>                         >>>>>>>>>> On 23.06.2016 11:50, Thierry
>>>                         Onkelinx wrote:
>>>                         >>>>>>>>>>
>>>                         >>>>>>>>>>> Dear Philipp,
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>> 1. Fit a Poisson model to the
>>>                         data. 2. Simulate a new response
>>>                         >>>>>>>>>>> vector for the dataset according
>>>                         to the model. 3. Count the
>>>                         >>>>>>>>>>> number of zero's in the
>>>                         simulated response vector. 4. Repeat
>>>                         >>>>>>>>>>> step 2 and 3 a decent number of
>>>                         time and plot a histogram of
>>>                         >>>>>>>>>>> the number of zero's in the
>>>                         simulation. If the number of zero's
>>>                         >>>>>>>>>>> in the original dataset is
>>>                         larger than those in the
>>>                         >>>>>>>>>>> simulations, then the model
>>>                         can't capture all zero's. In such
>>>                         >>>>>>>>>>> case, first try to update the
>>>                         model and repeat the procedure.
>>>                         >>>>>>>>>>> If that fails, look for
>>>                         zero-inflated models.
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>> Best regards,
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>> ir. Thierry Onkelinx Instituut
>>>                         voor natuur- en bosonderzoek /
>>>                         >>>>>>>>>>> Research Institute for Nature
>>>                         and Forest team Biometrie &
>>>                         >>>>>>>>>>> Kwaliteitszorg / team Biometrics
>>>                         & Quality Assurance
>>>                         >>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht
>>>                         Belgium
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>> To call in the statistician
>>>                         after the experiment is done may
>>>                         >>>>>>>>>>> be no more than asking him to
>>>                         perform a post-mortem
>>>                         >>>>>>>>>>> examination: he may be able to
>>>                         say what the experiment died of.
>>>                         >>>>>>>>>>> ~ Sir Ronald Aylmer Fisher The
>>>                         plural of anecdote is not data.
>>>                         >>>>>>>>>>> ~ Roger Brinner The combination
>>>                         of some data and an aching
>>>                         >>>>>>>>>>> desire for an answer does not
>>>                         ensure that a reasonable answer
>>>                         >>>>>>>>>>> can be extracted from a given
>>>                         body of data. ~ John Tukey
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>> 2016-06-23 11:27 GMT+02:00
>>>                         Philipp Singer <killver at gmail.com
>>>                         <mailto:killver at gmail.com>
>>>                         >>>>>>>>>>> <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>
>>>                         <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>>>>:
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>> Thanks Thierry - That totally
>>>                         makes sense. Is there some way of
>>>                         >>>>>>>>>>> formally checking that, except
>>>                         thinking about the setting and
>>>                         >>>>>>>>>>> underlying processes?
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>> On 23.06.2016 11:04, Thierry
>>>                         Onkelinx wrote:
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> Dear Philipp,
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> Do you have just lots of
>>>                         zero's, or more zero's than the
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> Poisson
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> distribution can explain? Those
>>>                         are two different things.
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> The example
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> below generates data from a
>>>                         Poisson distribution and has
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> 99% zero's
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> but no zero-inflation. The
>>>                         second example has only 1%
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> zero's but is
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> clearly zero-inflated.
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> set.seed(1) n <- 1e8 sim <-
>>>                         rpois(n, lambda = 0.01) mean(sim
>>>                         >>>>>>>>>>>> == 0) hist(sim)
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> sim.infl <- rbinom(n, size = 1,
>>>                         prob = 0.99) * rpois(n,
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> lambda = 1000)
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> So before looking for
>>>                         zero-inflated models, try to model
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> the zero's.
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> Best regards,
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> ir. Thierry Onkelinx Instituut
>>>                         voor natuur- en bosonderzoek /
>>>                         >>>>>>>>>>>> Research Institute
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> for Nature
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> and Forest team Biometrie &
>>>                         Kwaliteitszorg / team Biometrics
>>>                         >>>>>>>>>>>> & Quality
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> Assurance
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> Kliniekstraat 25 1070
>>>                         Anderlecht Belgium
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> To call in the statistician
>>>                         after the experiment is done
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> may be no
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> more than asking him to perform
>>>                         a post-mortem examination:
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> he may be
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> able to say what the experiment
>>>                         died of. ~ Sir Ronald
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> Aylmer Fisher
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> The plural of anecdote is not
>>>                         data. ~ Roger Brinner The
>>>                         >>>>>>>>>>>> combination of some data and an
>>>                         aching desire for an
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> answer does
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> not ensure that a reasonable
>>>                         answer can be extracted from a
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> given body
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> of data. ~ John Tukey
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> 2016-06-23 10:07 GMT+02:00
>>>                         Philipp Singer
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> <killver at gmail.com
>>>                         <mailto:killver at gmail.com>
>>>                         <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>>
>>>                         >>>>>>>>>>> <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>
>>>                         <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>>>
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>
>>>                         <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>>
>>>                         >>>>>>>>>>>> <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>
>>>                         <mailto:killver at gmail.com
>>>                         <mailto:killver at gmail.com>>>>>:
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> Dear group - I am currently
>>>                         fitting a Poisson glmer
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> where I have
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> an excess of outcomes that are
>>>                         zero (>95%). I am now
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> debating on
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> how to proceed and came up with
>>>                         three options:
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> 1.) Just fit a regular glmer to
>>>                         the complete data. I am
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> not fully
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> sure how interpret the
>>>                         coefficients then, are they more
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> optimizing
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> towards distinguishing zero and
>>>                         non-zero, or also
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> capturing the
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> differences in those outcomes
>>>                         that are non-zero?
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> 2.) Leave all zeros out of the
>>>                         data and fit a glmer to
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> only those
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> outcomes that are non-zero.
>>>                         Then, I would only learn about
>>>                         >>>>>>>>>>>> differences in the non-zero
>>>                         outcomes though.
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> 3.) Use a zero-inflated Poisson
>>>                         model. My data is quite
>>>                         >>>>>>>>>>>> large-scale, so I am currently
>>>                         playing around with the EM
>>>                         >>>>>>>>>>>> implementation of Bolker et al.
>>>                         that alternates between
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> fitting a
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> glmer with data that are
>>>                         weighted according to their zero
>>>                         >>>>>>>>>>>> probability, and fitting a
>>>                         logistic regression for the
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> probability
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> that a data point is zero. The
>>>                         method is elaborated for
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> the OWL
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> data in:
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>
>>>                         https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>>                         >>>>>>>>>>> <
>>>                         >>>>>>>>>>>
>>>                         https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>>>                         >>>>>>>>>>> >
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>> I am not fully sure how to
>>>                         interpret the results for the
>>>                         >>>>>>>
>>>                         >>>>>>>> zero-inflated version though. Would
>>>                         I need to interpret the
>>>                         >>>>>>>>>>>> coefficients for the result of
>>>                         the glmer similar to as
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> I would do
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> for my idea of 2)? And then on
>>>                         top of that interpret the
>>>                         >>>>>>>>>>>> coefficients for the logistic
>>>                         regression regarding whether
>>>                         >>>>>>>>>>>> something is in the perfect or
>>>                         imperfect state? I am
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> also not
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> quite sure what the common
>>>                         approach for the zformula is
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> here. The
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> OWL elaborations only use
>>>                         zformula=z~1, so no random
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>> effect; I
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>> would use the same formula as
>>>                         for the glmer.
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> I am appreciating some help and
>>>                         pointers.
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> Thanks! Philipp
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>>
>>>                         _______________________________________________
>>>                         >>>>>>>>>>>>
>>>                         R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         >>>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>
>>>                         >>>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         >>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>>
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         >>>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         >>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>>>
>>>                         mailing list
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>>
>>>                         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>                         >>>>>>>>>>>>
>>>                         <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>>
>>>                         >>>>>>>>>>>> [[alternative HTML version
>>>                         deleted]]
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>
>>>                         _______________________________________________
>>>                         >>>>>>>>>>> R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         >>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>
>>>                         >>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>
>>>                         >>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         >>>>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>>
>>>                         mailing list
>>>                         >>>>>>>>>>>
>>>                         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>                         >>>>>>>>>>>
>>>                         <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>>
>>>                         >>>>>>>>>>> [[alternative HTML version deleted]]
>>>                         >>>>>>>>>
>>>                         >>>>>>>>>
>>>                         _______________________________________________
>>>                         >>>>>>>>> R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         >>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>
>>>                         >>>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>
>>>                         mailing list
>>>                         >>>>>>>>>
>>>                         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>                         >>>>>>>>>
>>>                         <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>                         >>>>>>>>>
>>>                         >>>>>>>> [[alternative HTML version deleted]]
>>>                         >>>>>>>>
>>>                         >>>>>>>>
>>>                         _______________________________________________
>>>                         >>>>>>>> R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         >>>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>
>>>                         mailing list
>>>                         >>>>>>>>
>>>                         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>                         >>>>>>>>
>>>                         >>>>>>>>
>>>                         _______________________________________________
>>>                         >>>>>>> R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         >>>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>
>>>                         mailing list
>>>                         >>>>>>>
>>>                         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>                         >>>>>>>
>>>                         >>>>>>
>>>                         _______________________________________________
>>>                         >>>>>> R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         >>>>>>
>>>                         <mailto:R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>>
>>>                         mailing list
>>>                         >>>>>>
>>>                         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>                         >>>>>>
>>>                         >>>>>
>>>                         >>>  [[alternative HTML version deleted]]
>>>                         >>>
>>>                         >>>
>>>                         _______________________________________________
>>>                         >>> R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         mailing list
>>>                         >>>
>>>                         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>                         >>>
>>>                         >>>
>>>                         >
>>>
>>>                         [[alternative HTML version deleted]]
>>>
>>>                         _______________________________________________
>>>                         R-sig-mixed-models at r-project.org
>>>                         <mailto:R-sig-mixed-models at r-project.org>
>>>                         mailing list
>>>                         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>

	[[alternative HTML version deleted]]