[R-sig-ME] R: mixed zero-inflated poisson regression
Highland Statistics Ltd
highstat at highstat.com
Mon Nov 14 17:49:06 CET 2011
------------------------------
Message: 3
Date: Mon, 14 Nov 2011 17:17:47 +0100
From: Ivar Herfindal<ivar.herfindal at bio.ntnu.no>
To: "ONKELINX, Thierry"<Thierry.ONKELINX at inbo.be>
Cc: "r-sig-mixed-models at r-project.org"
<r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] mixed zero-inflated poisson regression
Message-ID:<4EC13F2B.5040307 at bio.ntnu.no>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Dear Thierry
Thank you for your quick response. I have previously tried to follow the
CourseNote you suggested without success. I gave it another try today,
and it seems I have manged to generate such a zero-inflated model as I
want during the day. The syntax looks something like this:
mcmodm1<- MCMCglmm(Y ~ trait -1 + at.level(trait, 1:2):A, data=dataa,
family="zipoisson", rcov=~idh(trait):units, nitt=30000, random=~gr)
However, the main challenge is that the model seems quite unstable
withmy "real" data. Even with 100000 iterations, two identical specified
models gives quite different output, and plotting the mcmodm1$Sol
suggest not very stable estimates. This occurs also on my simulated data.
Do anyone have any suggestion on how to make the model more stable (e.g.
by the rcov or random arguments)? Or do I just have to run it as long as
it takes to get "fairly precise" estimates?
Ivan
First of all I don't think that 100,000 iterations are enough for a ZIP GLMM. Also...you need to ask yourself whether you really want
to use covariates and random effects in the binary part. I assume that is what your MCMCglmm code is doing (??)..I haven't used MCMCglmm.
It does increase the complexity of the model.
Also...before applying ZIP GLMMs you need to ask yourself, and investigate, where all these zeros are coming from. It may well be that the
binary part of the model tries to model the zeros, whereas the random effect terms may also try to capture the zeros. This may happen
if most of your zeros are clustered in your groups (or levels) for the random effect. As a result the chains coming out of MCMC for the different parameters
are highly correlated. Check mixing of the chains....and plot chains of different parameters to see whether they are correlated...I guess they are. And that is
most likely because different components in the model are fighting for the zeros.
The above problem is perhaps easier to understand for a time series. Suppose you have a long time series of counts...but most counts are 0. So..is this zero inflation or
temporal auto-correlation? Fit a ZIP GLM with temporal correlation and you will get the same problems as you described above (poor mixing).
Zero inflation doesn't always mean you have to apply zero inflated models. Perhaps an ordinary GLMM can model the zeros? My gutfeeling is that
your model is to complicated (ie. random effects vs the zero inflation part of the model). Solution...simplify the model.
See:
Zero Inflated Models and Generalized Linear Mixed Models with WinBUGS and R (2012). Zuur et al.
for the same problems (and solutions).
Alain
Ivar
On 14.11.2011 10:40, ONKELINX, Thierry wrote:
> Dear Ivar,
>
> Your assumption on MCMCglmm is incorrect. You can specify different predictor for both the count and the binomial component. Have a look at the CourseNotes vignette for an example.
>
> Best regards,
>
> Thierry
>
> ----------------------------------------------------------------------------
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek
> team Biometrie& Kwaliteitszorg
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
>
> Research Institute for Nature and Forest
> team Biometrics& Quality Assurance
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
>
> tel. + 32 54/436 185
> Thierry.Onkelinx at inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
>> -----Oorspronkelijk bericht-----
>> Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-
>> bounces at r-project.org] Namens Ivar Herfindal
>> Verzonden: maandag 14 november 2011 9:57
>> Aan: r-sig-mixed-models at r-project.org
>> Onderwerp: [R-sig-ME] mixed zero-inflated poisson regression
>>
>> Dear all
>>
>> It seems like I have reached the limit of my statistical skills, and now turn to the
>> mixed-list for some assistance.
>>
>> I have a dataset with a response variable that appears to have a distribution that
>> is zero-inflated poisson (ZIP) (or close to). The response variable is number of
>> fledglings from nests with known mothers.
>> The number of fledglings has a lot of zeros that comes from failed breeding
>> attempts. The number of fledgings can therefore be considered as a two-step
>> process:
>> 1. do the female breed, and
>> 2. if the female breed, what was the number of fledglings.
>>
>> I want to relate the number of fledglings to some predictor variables, and I am
>> particularly interested in the combined effect of these variables on the
>> probability to breed and number of fledglings given breeding. I have several
>> observations per female (multiple years) and each nest location is also
>> represented by several females (in different years). I therefore need/want to
>> account for random factors. I guess I could run two separate models (one
>> binomial and one poisson), but I need to make predictions on how my
>> explanatory variables affect the number of fledglings irrespective of the
>> mechanism behind.
>>
>> I have searched R-sites a lot, and it found some packages that can model ZIP.
>> 1. pscl (function zeroinfl)
>> Pros: have the option to relate both the poisson and binomial process to the
>> predictor variable, so I can evaluate how probability of breeding and number of
>> fledglings depend on my predictor variables
>> Cons: Can not include random factors.
>>
>> 2. glmmADMB (function glmmadmb), MCMCglmm, and gamlss
>> Pros: allows random factors
>> Cons: 1. all packages seems to assume that all zeros have the same probability
>> of belonging to the zero component, i.e. the logit-link structure does not depend
>> on the predictor variable.
>> 2. For some of the packages (particularly MCMCglmm) I have some challenges
>> about how to specify the model correctly.
>>
>> Does anyopne in the R(mixed) community have experience on this? I provide a
>> simple example below to illustrate what I want.
>>
>> Ivar Herfindal
>>
>> ## A simple reproducible example
>> library(pscl)
>> library(glmmADMB)
>> library(MCMCglmm)
>> library(gamlss)
>> set.seed(4242)
>>
>> # create some dummy data that have a lot of zeros in the dependent variable
>> dataa<- cbind.data.frame(Y=rbinom(100, 1, 0.7)) dataa$Y<- dataa$Y *
>> rpois(100, 3) # the response dataa$A<- rnorm(100) # explanatory variable
>> dataa$gr<- factor(1:10) # grouping variable
>>
>> # fit models with different packages
>>
>> ## First non-mixed models
>> ### pscl zeroinfl
>> zimod<- zeroinfl(Y ~ A, data=dataa, dist="poisson") # Get estimate of A on
>> poisson and binomial process separate, this is what I want (I believe so...)
>>
>> zimod2<- zeroinfl(Y ~ A|1, data=dataa, dist="poisson") # All zero counts have
>> the same probability of belonging to the zero component
>>
>> ### glmmADMB
>> admod<- glmmadmb(Y ~ A, data=dataa, family="poisson", zeroInflation=TRUE)
>> # Same results as zimod2.
>> # Q: Is it possible to obtain similar results as zimod, i.e.
>> # separate the effect of A on the poisson and binomial process?
>>
>> ### MCMCglmm
>> mcmod<- MCMCglmm(Y ~ A, data=dataa, family="zipoisson",
>> rcov=~idh(trait):units)
>> # more or less same as admod and zimod2, but "plot(mcmod$Sol)" reveals large
>> variation.
>> # Can change burn and nitt to get more stable(?)
>>
>> mcmod2<- MCMCglmm(Y ~ A, data=dataa, family="zipoisson",
>> rcov=~idh(trait):units, burn=10000, nitt=20000) # Q: Is it possible to separate the
>> effect of A on the poisson and binomial process?
>>
>> ### gamlss
>> gamod<- gamlss(Y ~ A, data=dataa, family=ZIP) # same as zimod2 and admod
>>
>> ## Some attempts on mixed models with gr as random factor admodm<-
>> glmmadmb(Y ~ A, data=dataa, family="poisson", zeroInflation=TRUE,
>> random=~(1|gr)) mcmodm<- MCMCglmm(Y ~ A, data=dataa,
>> family="zipoisson", rcov=~idh(trait):units, random=~idh(trait):gr) gamodm<-
>> gamlss(Y ~ A + random(gr), data=dataa, family=ZIP) # For glmmadmb and gamlss
>> I think the models are specified ok # For MCMCglmm I am not sure if it is
>> specified correctly # But I still want to be able to separate the effect of A on the
>> two processes.
>>
>> --
>> --------------------------------------------------
>> Ivar Herfindal (PhD, Researcher)
>> Centre for Conservation Biology
>> Department of Biology
>> Norwegian University for Science and Technology
>> N-7491 Trondheim, Norway
>>
>> email: Ivar.Herfindal at bio.ntnu.no
>> web: http://www.ntnu.no/employees/ivar.herfindal
>> phone: +47 73596253
>> Cellphone: +47 91625333
>> fax: +47 73596090
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
--------------------------------------------------
Ivar Herfindal (PhD, Researcher)
Centre for Conservation Biology
Department of Biology
Norwegian University for Science and Technology
N-7491 Trondheim, Norway
email: Ivar.Herfindal at bio.ntnu.no
web: http://www.ntnu.no/employees/ivar.herfindal
phone: +47 73596253
Cellphone: +47 91625333
fax: +47 73596090
------------------------------
_______________________________________________
R-sig-mixed-models mailing list
R-sig-mixed-models at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
End of R-sig-mixed-models Digest, Vol 59, Issue 18
More information about the R-sig-mixed-models
mailing list