[R-sig-ME] mixed mutlinomial regression for count data with, overdisperion & zero-inflation

Highland Statistics Ltd highstat at highstat.com
Wed May 18 09:39:53 CEST 2016



On 18/05/2016 08:26, Stéphanie Périquet wrote:
> Yeah thanks Alain, I'm definitely planning to buy this book!
>
> So I looked at the zeros in my data abased on you advice and I did the 
> following:
> mod<-glmer(count~item+item:season+item:moon+item:season:moon+(1|indiv/obs)+(1|id),family=poisson,nAGQ=0,data=diet3)
> z<-simulate(mod,nsim=1000)
>
> For the original data I have 69.3% of zeros while the average over the 
> 1000 simulations is 63.5%.Is there a way to statistically compare 
> these 2 values? Or could you say that these 2 figures are not very 
> different and then zero inflation models might not be necessary?
>

Stephanie,

Make a histogram of the 1000 values of the percentages of zeros....and 
present the 69.3% as a big blue/red dot. If the dot for your observed 
data is in the tails you have a problem.

I don't see the point of a test in your case. Such a simulation is close 
to bootstrapping...so I guess you can come up with a test somehow. If 
you do this type of analysis in a Bayesian framework it is often (and 
confusingly) called a Bayesian p-value (counting how often the simulated 
value is larger than your observed one).

I would just go for the histogram...seems you are lucky.

Alain






> Best,
> Stephanie
>
> On 17 May 2016 at 20:21, Highland Statistics Ltd 
> <highstat at highstat.com <mailto:highstat at highstat.com>> wrote:
>
>
>
>     On 17/05/2016 18:53, Stéphanie Périquet wrote:
>>     Dear Alain,
>>
>>     Thanks for your reply and advices! Will try to do that and wait
>>     for your very timely paper to come out to be sure I did the right
>>     thing!
>>
>
>     Stephanie,
>
>     Although it does not cover multinomial models directly, this one
>     may be of use as well:
>
>     Beginner's Guide to Zero-Inflated Models with R (2016). Zuur AF
>     and Ieno EN
>     http://highstat.com/BGZIM.htm
>
>     Sorry for the self-references.
>
>     Kind regards,
>
>     Alain
>
>
>>     Best,
>>     Stephanie
>>
>>     On 17 May 2016 at 12:08, Highland Statistics Ltd
>>     <highstat at highstat.com <mailto:highstat at highstat.com>> wrote:
>>
>>
>>
>>
>>         >
>>         ----------------------------------------------------------------------
>>         >
>>         > Message: 1
>>         > Date: Tue, 17 May 2016 08:28:42 +0200
>>         > From: St?phanie P?riquet <stephanie.periquet at gmail.com
>>         <mailto:stephanie.periquet at gmail.com>>
>>         > To: Ben Bolker <bbolker at gmail.com <mailto:bbolker at gmail.com>>
>>         > Cc: r-sig-mixed-models at r-project.org
>>         <mailto:r-sig-mixed-models at r-project.org>
>>         > Subject: Re: [R-sig-ME] Mixed mutlinomial regression for
>>         count data
>>         >       with overdisperion & zero-inflation
>>         > Message-ID:
>>         >
>>          <CAMKTVFXZnvS1g-FaNVQ1FQUj5u84S-fd=k4u_6x5PwJUZ2R+bQ at mail.gmail.com
>>         <mailto:k4u_6x5PwJUZ2R+bQ at mail.gmail.com>>
>>         > Content-Type: text/plain; charset="UTF-8"
>>         >
>>         > Hi Ben,
>>         >
>>         > Thank you very much for your answer!
>>         >
>>         > I am aware that a lot of zero doesn't mean zero inflation,
>>         but if my
>>         > understanding is correct the only way to check for ZI would
>>         be to compare
>>         > one model take doesn't take it into account and another one
>>         that does right?
>>
>>         Incorrect.
>>         1. Calculate the percentage of zeros for your observed data.
>>         2. Fit a model....this can be a model without zero inflation
>>         stuff.
>>         3. Simulate 1000 data sets from your model and for each
>>         simulated data
>>         set assess the percentage of zeros.
>>         4. Compare the results in 3 with those in 1.
>>
>>         5. Even nicer....
>>         5a. Plot a simple frequency table for the original data
>>         (plot(table(Response), type = "h").
>>         5b. Calculate a table() for each of your simulated data.
>>         5c. Calculate the average frequency table.
>>         5d. Compare 5a and 5c.
>>
>>         For a nice example and R code, see:
>>         A protocol for conducting and presenting results of
>>         regression-type
>>         analyses. Zuur & Ieno
>>         doi: 10.1111/2041-210X.12577
>>         Methods in Ecology and Evolution 2016
>>
>>         Comes out in 2 weeks or so.
>>
>>         Kind regards,
>>
>>         Alain
>>
>>
>>         > With the model example I gave (count~item+item:season+item:
>>         > moon+offset(logduration)+(1+indiv)+(1|obs)) glmmADMB
>>         doesn't run but I'm
>>         > gonna dig a bit more into this ans come back t you if I
>>         can't figure it out.
>>         >
>>         > Best,
>>         > Stephanie
>>         >
>>         > On 17 May 2016 at 00:41, Ben Bolker <bbolker at gmail.com
>>         <mailto:bbolker at gmail.com>> wrote:
>>         >
>>         >> St?phanie P?riquet <stephanie.periquet at ...>
>>         <mailto:stephanie.periquet at ...> writes:
>>         >>
>>         >>> Dear list members,
>>         >>>
>>         >>> First sorry for this very long first post ?
>>         >>    That's OK.  I'm only going to answer part of it,
>>         because it's long.
>>         >>> I am looking for advises to fit a mixed multinomial
>>         regression on count
>>         >>> data that are overdispersed and zero-inflated. My
>>         question is to evaluate
>>         >>> the effect of season and moonlight on diet composition of
>>         bat-eared
>>         >> foxes.
>>         >>> My dataset is composed of 14 possible prey item, 20
>>         individual foxes
>>         >>> observed, 4 seasons and a moon illumination index ranging
>>         from 0 to 1 by
>>         >>> 0.1 implements (considered as a continuous variable even
>>         if takes only 11
>>         >>> values). For each unique combination of
>>         individual*season*moon, I thus
>>         >> has
>>         >>> 14 lines, one for the count of each prey item.
>>         >>>
>>         >>>  From what I gathered, it would be possible to use
>>         >>> a standard glmm model of
>>         >>> the following form to answer my question (ie a
>>         multinomial regression):
>>         >>>
>>         >>> glmer(count~item+item:season+item:moon+offset(logduration)+
>>         >>> (1+indiv)+(1|obs)+
>>         >>> (1|id), family=poisson)
>>         >>    Yes, but I don't know if this will account for the
>>         possible dependence
>>         >> *among* prey types.
>>         >>
>>         >>> where count is the number of prey of a given type
>>         recorded eaten;
>>         >>>
>>         >>> item is the prey type;
>>         >>>
>>         >>> logduration is the log(total time observed for a given
>>         combination of
>>         >>> individual*season*moon);
>>         >>>
>>         >>> obs is a unique id for each combination of
>>         individual*season*moon,
>>         >>> so each
>>         >>> obs value regroups 14 lines (one for each prey item) with
>>         the same
>>         >>> individual*season*moon;
>>         >>>
>>         >>> id is a unique id for each line to account for
>>         overdispersion (as
>>         >>> quasi-poisson or negative binomial distributions are not
>>         implemented in
>>         >>> lme4, Elston et al. 2001).
>>         >>     Seems about right.
>>         >>     There is glmer.nb now, but you might not want it; it
>>         tends to
>>         >> be slower and more fragile, and you'd still have to deal with
>>         >> zero-inflation.
>>         >>
>>         >>> However, they are a lot of zeros in my data i.e. lot of
>>         prey items has
>>         >>> never been observed being eaten for mane combinations of
>>         >>> individual*season*moon.
>>         >>    That doesn't *necessarily* mean you need
>>         zero-inflation. Large
>>         >> numbers of zeros might just reflect low probabilities, not
>>         ZI per se.
>>         >>
>>         >>> Following Ben Bolker wiki (http://glmm.wikidot.com/faq) I
>>         summarize
>>         >> that I
>>         >>> should use of the following methods to answer my question
>>         >>>
>>         >>>     - ?      glmmADMB, with family=nbinom
>>         >>>     - ?      MCMCglmm, with family=zipoisson
>>         >>>     - ? "expectation-maximization (EM) algorithm" in lme4
>>         >>    Note there's a marginally newer version at
>>         >>
>>         https://rawgit.com/bbolker/mixedmodels-misc/master/glmmFAQ.html
>>         >>
>>         >>    Another, newer choice is glmmTMB (available on Github) with
>>         >> family="nbinom2"
>>         >>
>>         >>> Here come the questions:
>>         >>> 1.  1. Is it correct to assume that I could use the same
>>         model
>>         >>> structure
>>         >>>
>>         (count~item+item:season+item:moon+offset(logduration)+(1+indiv)+(1|obs))
>>         >>> in glmmADMB or MCMCglmm to answer my question ?
>>         >>    glmmADMB or glmmTMB, yes: I'm not sure about MCMCglmm
>>         >>
>>         >>> 2.   I then wouldn't need the (1|id) to correct for
>>         overdispersion as
>>         >> both
>>         >>> methods would already account for it, correct?
>>         >>     That's right, I think.
>>         >>
>>         >>> 3.   I am totally new to MCMCglmm, so  ...
>>         >>    I'm going to let Jarrod Hadfield, or someone else,
>>         answer this one.
>>         >>> 4.     4.  If I were to use the EM algorithm method,
>>         >>> how should the results
>>         >>> be interpreted?
>>         >>    The result is composed of two models -- a 'binary'
>>         (structural zero vs
>>         >> non-structural zero) and a 'conditional' (count) part.
>>         >> _______________________________________________
>>         >> R-sig-mixed-models at r-project.org
>>         <mailto:R-sig-mixed-models at r-project.org> mailing list
>>         >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>         >
>>         >
>>         >
>>
>>         --
>>         Dr. Alain F. Zuur
>>
>>         First author of:
>>         1. Beginner's Guide to GAMM with R (2014).
>>         2. Beginner's Guide to GLM and GLMM with R (2013).
>>         3. Beginner's Guide to GAM with R (2012).
>>         4. Zero Inflated Models and GLMM with R (2012).
>>         5. A Beginner's Guide to R (2009).
>>         6. Mixed effects models and extensions in ecology with R (2009).
>>         7. Analysing Ecological Data (2007).
>>
>>         Highland Statistics Ltd.
>>         9 St Clair Wynd
>>         UK - AB41 6DZ Newburgh
>>         Tel:   0044 1358 788177
>>         Email: highstat at highstat.com <mailto:highstat at highstat.com>
>>         URL: www.highstat.com <http://www.highstat.com>
>>
>>
>>                 [[alternative HTML version deleted]]
>>
>>         _______________________________________________
>>         R-sig-mixed-models at r-project.org
>>         <mailto:R-sig-mixed-models at r-project.org> mailing list
>>         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>>
>>
>>     -- 
>>     *Stéphanie PERIQUET (PhD) * - Bat-eared Fox Research Project
>>     /Dept of Zoology & Entomology/
>>     /University of the Free State, Qwaqwa Campus/
>>     *Cell: +27 79 570 2683*
>>     ResearchGate profile
>>     <https://www.researchgate.net/profile/Stephanie_Periquet>
>>
>>
>>     Kalahari bat-eared foxes on Twitter
>>     <https://twitter.com/kal_batearedfox>
>
>     -- 
>     Dr. Alain F. Zuur
>
>     First author of:
>     1. Beginner's Guide to GAMM with R (2014).
>     2. Beginner's Guide to GLM and GLMM with R (2013).
>     3. Beginner's Guide to GAM with R (2012).
>     4. Zero Inflated Models and GLMM with R (2012).
>     5. A Beginner's Guide to R (2009).
>     6. Mixed effects models and extensions in ecology with R (2009).
>     7. Analysing Ecological Data (2007).
>
>     Highland Statistics Ltd.
>     9 St Clair Wynd
>     UK - AB41 6DZ Newburgh
>     Tel:   0044 1358 788177
>     Email:highstat at highstat.com <mailto:highstat at highstat.com>
>     URL:www.highstat.com <http://www.highstat.com>
>
>
>
>
> -- 
> *Stéphanie PERIQUET (PhD) * - Bat-eared Fox Research Project
> /Dept of Zoology & Entomology/
> /University of the Free State, Qwaqwa Campus/
> *Cell: +27 79 570 2683*
> ResearchGate profile 
> <https://www.researchgate.net/profile/Stephanie_Periquet>
>
>
> Kalahari bat-eared foxes on Twitter <https://twitter.com/kal_batearedfox>

-- 
Dr. Alain F. Zuur

First author of:
1. Beginner's Guide to GAMM with R (2014).
2. Beginner's Guide to GLM and GLMM with R (2013).
3. Beginner's Guide to GAM with R (2012).
4. Zero Inflated Models and GLMM with R (2012).
5. A Beginner's Guide to R (2009).
6. Mixed effects models and extensions in ecology with R (2009).
7. Analysing Ecological Data (2007).

Highland Statistics Ltd.
9 St Clair Wynd
UK - AB41 6DZ Newburgh
Tel:   0044 1358 788177
Email: highstat at highstat.com
URL:   www.highstat.com


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list