[R-sig-ME] Question about zero-inflated Poisson glmer
Philipp Singer
killver at gmail.com
Thu Jun 23 17:22:36 CEST 2016
Thanks, great information, that is really helpful.
I agree that those are different things, however when using a random
effect for overdispersion, I can simulate the same number of zero
outcomes (~95%).
On 23.06.2016 15:50, Thierry Onkelinx wrote:
> Be careful when using overdispersion to model zero-inflation. Those
> are two different things.
>
> I've put some information together in
> http://rpubs.com/INBOstats/zeroinflation
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may be
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a given body
> of data. ~ John Tukey
>
> 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com
> <mailto:killver at gmail.com>>:
>
> Thanks! Actually, accounting for overdispersion is super important
> as it seems, then the zeros can be captured well.
>
>
> On 23.06.2016 11:50, Thierry Onkelinx wrote:
>> Dear Philipp,
>>
>> 1. Fit a Poisson model to the data.
>> 2. Simulate a new response vector for the dataset according to
>> the model.
>> 3. Count the number of zero's in the simulated response vector.
>> 4. Repeat step 2 and 3 a decent number of time and plot a
>> histogram of the number of zero's in the simulation. If the
>> number of zero's in the original dataset is larger than those in
>> the simulations, then the model can't capture all zero's. In such
>> case, first try to update the model and repeat the procedure. If
>> that fails, look for zero-inflated models.
>>
>> Best regards,
>>
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek / Research Institute for
>> Nature and Forest
>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>> Assurance
>> Kliniekstraat 25
>> 1070 Anderlecht
>> Belgium
>>
>> To call in the statistician after the experiment is done may be
>> no more than asking him to perform a post-mortem examination: he
>> may be able to say what the experiment died of. ~ Sir Ronald
>> Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer
>> does not ensure that a reasonable answer can be extracted from a
>> given body of data. ~ John Tukey
>>
>> 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
>> <mailto:killver at gmail.com>>:
>>
>> Thanks Thierry - That totally makes sense. Is there some way
>> of formally
>> checking that, except thinking about the setting and
>> underlying processes?
>>
>> On 23.06.2016 11:04, Thierry Onkelinx wrote:
>> > Dear Philipp,
>> >
>> > Do you have just lots of zero's, or more zero's than the
>> Poisson
>> > distribution can explain? Those are two different things.
>> The example
>> > below generates data from a Poisson distribution and has
>> 99% zero's
>> > but no zero-inflation. The second example has only 1%
>> zero's but is
>> > clearly zero-inflated.
>> >
>> > set.seed(1)
>> > n <- 1e8
>> > sim <- rpois(n, lambda = 0.01)
>> > mean(sim == 0)
>> > hist(sim)
>> >
>> > sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
>> lambda = 1000)
>> > mean(sim.infl == 0)
>> > hist(sim.infl)
>> >
>> > So before looking for zero-inflated models, try to model
>> the zero's.
>> >
>> > Best regards,
>> >
>> >
>> > ir. Thierry Onkelinx
>> > Instituut voor natuur- en bosonderzoek / Research Institute
>> for Nature
>> > and Forest
>> > team Biometrie & Kwaliteitszorg / team Biometrics & Quality
>> Assurance
>> > Kliniekstraat 25
>> > 1070 Anderlecht
>> > Belgium
>> >
>> > To call in the statistician after the experiment is done
>> may be no
>> > more than asking him to perform a post-mortem examination:
>> he may be
>> > able to say what the experiment died of. ~ Sir Ronald
>> Aylmer Fisher
>> > The plural of anecdote is not data. ~ Roger Brinner
>> > The combination of some data and an aching desire for an
>> answer does
>> > not ensure that a reasonable answer can be extracted from a
>> given body
>> > of data. ~ John Tukey
>> >
>> > 2016-06-23 10:07 GMT+02:00 Philipp Singer
>> <killver at gmail.com <mailto:killver at gmail.com>
>> > <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>> >
>> > Dear group - I am currently fitting a Poisson glmer
>> where I have
>> > an excess of outcomes that are zero (>95%). I am now
>> debating on
>> > how to proceed and came up with three options:
>> >
>> > 1.) Just fit a regular glmer to the complete data. I am
>> not fully
>> > sure how interpret the coefficients then, are they more
>> optimizing
>> > towards distinguishing zero and non-zero, or also
>> capturing the
>> > differences in those outcomes that are non-zero?
>> >
>> > 2.) Leave all zeros out of the data and fit a glmer to
>> only those
>> > outcomes that are non-zero. Then, I would only learn about
>> > differences in the non-zero outcomes though.
>> >
>> > 3.) Use a zero-inflated Poisson model. My data is quite
>> > large-scale, so I am currently playing around with the EM
>> > implementation of Bolker et al. that alternates between
>> fitting a
>> > glmer with data that are weighted according to their zero
>> > probability, and fitting a logistic regression for the
>> probability
>> > that a data point is zero. The method is elaborated for
>> the OWL
>> > data in:
>> >
>> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
>> >
>> > I am not fully sure how to interpret the results for the
>> > zero-inflated version though. Would I need to interpret the
>> > coefficients for the result of the glmer similar to as
>> I would do
>> > for my idea of 2)? And then on top of that interpret the
>> > coefficients for the logistic regression regarding whether
>> > something is in the perfect or imperfect state? I am
>> also not
>> > quite sure what the common approach for the zformula is
>> here. The
>> > OWL elaborations only use zformula=z~1, so no random
>> effect; I
>> > would use the same formula as for the glmer.
>> >
>> > I am appreciating some help and pointers.
>> >
>> > Thanks!
>> > Philipp
>> >
>> > _______________________________________________
>> > R-sig-mixed-models at r-project.org
>> <mailto:R-sig-mixed-models at r-project.org>
>> > <mailto:R-sig-mixed-models at r-project.org
>> <mailto:R-sig-mixed-models at r-project.org>> mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >
>> >
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org
>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list