[R-sig-ME] mixed zero-inflated poisson regression

Ivar Herfindal ivar.herfindal at bio.ntnu.no
Mon Nov 14 09:57:11 CET 2011


Dear all

It seems like I have reached the limit of my statistical skills, and now 
turn to the mixed-list for some assistance.

I have a dataset with a response variable that appears to have a 
distribution that is zero-inflated poisson (ZIP) (or close to). The 
response variable is number of fledglings from nests with known mothers. 
The number of fledglings has a lot of zeros that comes from failed 
breeding attempts. The number of fledgings can therefore be considered 
as a two-step process:
1. do the female breed, and
2. if the female breed, what was the number of fledglings.

I want to relate the number of fledglings to some predictor variables, 
and I am particularly interested in the combined effect of these 
variables on the probability to breed and number of fledglings given 
breeding. I have several observations per female (multiple years) and 
each nest location is also represented by several females (in different 
years). I therefore need/want to account for random factors. I guess I 
could run two separate models (one binomial and one poisson), but I need 
to make predictions on how my explanatory variables affect the number of 
fledglings irrespective of the mechanism behind.

I have searched R-sites a lot, and it found some packages that can model 
ZIP.
1. pscl (function zeroinfl)
Pros: have the option to relate both the poisson and binomial process to 
the predictor variable, so I can evaluate how probability of breeding 
and number of fledglings depend on my predictor variables
Cons: Can not include random factors.

2. glmmADMB (function glmmadmb), MCMCglmm, and gamlss
Pros: allows random factors
Cons: 1. all packages seems to assume that all zeros have the same 
probability of belonging to the zero component, i.e. the logit-link 
structure does not depend on the predictor variable.
2. For some of the packages (particularly MCMCglmm) I have some 
challenges about how to specify the model correctly.

Does anyopne in the R(mixed) community have experience on this? I 
provide a simple example below to illustrate what I want.

Ivar Herfindal

## A simple reproducible example
library(pscl)
library(glmmADMB)
library(MCMCglmm)
library(gamlss)
set.seed(4242)

# create some dummy data that have a lot of zeros in the dependent variable
dataa <- cbind.data.frame(Y=rbinom(100, 1, 0.7))
dataa$Y <- dataa$Y * rpois(100, 3) # the response
dataa$A <- rnorm(100) # explanatory variable
dataa$gr <- factor(1:10) # grouping variable

# fit models with different packages

## First non-mixed models
### pscl zeroinfl
zimod <- zeroinfl(Y ~ A, data=dataa, dist="poisson")
# Get estimate of A on poisson and binomial process separate, this is 
what I want (I believe so...)

zimod2 <- zeroinfl(Y ~ A|1, data=dataa, dist="poisson")
# All zero counts have the same probability of belonging to the zero 
component

### glmmADMB
admod <- glmmadmb(Y ~ A, data=dataa, family="poisson", zeroInflation=TRUE)
# Same results as zimod2.
# Q: Is it possible to obtain similar results as zimod, i.e.
# separate the effect of A on the poisson and binomial process?

### MCMCglmm
mcmod <- MCMCglmm(Y ~ A, data=dataa, family="zipoisson", 
rcov=~idh(trait):units)
# more or less same as admod and zimod2, but "plot(mcmod$Sol)" reveals 
large variation.
# Can change burn and nitt to get more stable(?)

mcmod2 <- MCMCglmm(Y ~ A, data=dataa, family="zipoisson", 
rcov=~idh(trait):units, burn=10000, nitt=20000)
# Q: Is it possible to separate the effect of A on the poisson and 
binomial process?

### gamlss
gamod <- gamlss(Y ~ A, data=dataa, family=ZIP)
# same as zimod2 and admod

## Some attempts on mixed models with gr as random factor
admodm <- glmmadmb(Y ~ A, data=dataa, family="poisson", 
zeroInflation=TRUE, random=~(1|gr))
mcmodm <- MCMCglmm(Y ~ A, data=dataa, family="zipoisson", 
rcov=~idh(trait):units, random=~idh(trait):gr)
gamodm <- gamlss(Y ~ A + random(gr), data=dataa, family=ZIP)
# For glmmadmb and gamlss I think the models are specified ok
# For MCMCglmm I am not sure if it is specified correctly
# But I still want to be able to separate the effect of A on the two 
processes.

-- 
--------------------------------------------------
Ivar Herfindal (PhD, Researcher)
Centre for Conservation Biology
Department of Biology
Norwegian University for Science and Technology
N-7491 Trondheim, Norway

email:     Ivar.Herfindal at bio.ntnu.no
web:       http://www.ntnu.no/employees/ivar.herfindal
phone:     +47 73596253
Cellphone: +47 91625333
fax:       +47 73596090




More information about the R-sig-mixed-models mailing list