[R-sig-ME] Zero-inflated mixed effects model - clarification of zeros modeled and R package questions

Paul Johnson pauljohn32 at gmail.com
Fri Jun 22 05:43:53 CEST 2012

Dear Jennifer:
Response below

On Wed, Jun 20, 2012 at 5:32 PM, Jennifer Barrett
<jenn.s.barrett at gmail.com> wrote:
> Hi folks,
> I’m looking for some guidance in regards to zero-inflated models with
> repeated measures (i.e., random effect for site). My first question is more
> of a statistical one, while the second is related to R packages. Apologies
> for the long post; however, I want to make sure my concerns/questions are
> clear!
> Our project and dataset:
> - The aim of our project is to 1) examine associations between shoreline
> habitat characteristics and the abundance of several shorebird species; and
> 2) estimate the total abundance of each shorebird species within the entire
> study region based on the models from 1) above, with confidence intervals.
> Note that we will be using an information theoretic approach for 1) above,
> and would like to use MMI for 2).
> - Our response dataset consists of counts of shorebirds at >150 coastal
> sites, conducted on the second Sunday of each month between the months of
> Oct-March, over 10 years; however, not every site was surveyed in all
> months (we’ve limited our dataset to those with a minimum of 3 counts in a
> year).  Our response variable is thus the number of birds counted in a
> given month/year at a given site. Note that we plan to model each year
> separately.
> -  The habitat dataset consists of shoreline units within our entire study
> region, with each unit characterized by exposure, substrate type...etc.
> Using GIS, we’ve measured the length of shoreline belonging to shoreline
> categories (e.g., sand, rock, mud) within each survey site, the average
> exposure for the site, and other continuous attributes, as well as one
> presence/absence covariate.
> - Initial exploratory analysis has shown that the counts are zero-inflated.
> While there may be some false zeros in our dataset (i.e., observer error),
> the source of the zero-inflation is likely preference of shorebirds for
> particular sites with particular features and avoidance of others (i.e.,
> true zeros or “structural zeros”). Some zeros likely also arise because the
> species does not saturate its habitat (i.e., habitat suitable, but
> unoccupied – also a “true” zero), though again, the majority of the zeros
> are likely structural.
> Onto my questions:
> 1) I’ve been reading through the literature to decide what type of model
> would best be suited for our dataset and questions. While all articles seem
> to agree that the choice of a model needs to consider the source of excess
> zeros, they seem to contradict one another in regards to what zeros are
> being modeled in each component of a zero-inflated mixture model. Note that
> I am not considering a two-part (i.e., conditional) model, because I do not
> believe that all zeros arise from the occupancy process (as per Joseph et
> al. 2009 and as noted above, zero abundance can occur by chance in our
> system). Examples:
> - Martin et al. (2005) state that when zero inflation is due to true zeros,
> two-part or mixture models (ZIP or ZINB) are recommended, and that when
> zero inflation is due to false zeros, a ZIB mixture model is recommended;
> however, when zero inflation is due to both excess true and false zeros, a
> Bayesian framework may be used, though there is no formal discussion in the
> literature. NOTE: Since this article was published, Royle’s N-mixture model
> has addressed this issue; however, I cannot use this approach as my data do
> not meet the assumption of a closed population during the study period.
> - In contrast to Martin et al. (2005), Potts and Elith (2006) state that
> the zero-inflated mixture model structure implies that zero observations
> arising from the zero process are true negative observations, and that
> those arising from the Poisson process are false negative observations “that
> is, the habitat is suitable, but unoccupied” (p.155). However, on the
> previous page, they defined false negative as “attributable to experimental
> design… or observer error”, and habitat that is “suitable, but unoccupied”
> as a true negative, so I'm not sure which type of zero observation they are
> really referring to here for the Poisson process.
> - In contrast to both sources above, Zuur et al. (2009) state that in a ZIP
> or ZINB, zeros are modeled as coming from two processes – the binomial
> process, which models only false zeros (observer, design, and survey error)
> and the Poisson (or Negbin) process  which models the true zeros and
> counts. This is the opposite of what was stated by Potts and Elith.
> - Finally, I’ve read other sources which state that ZIPs simply treat the
> population as a mixture, with one set of subjects having a zero response –
> in other words, there is no mention of whether the zero process is modeling
> the “true” or “false” zeros.
> Thinking about my system: there are a bunch of sites where the birds (of a
> given species) never go (habitat is unsuitable), and a bunch where they do
> go with varying levels of abundance (habitat is suitable, but come sites
> are more favored than others, based on habitat features). Following the
> last bullet above, a site that is suitable may have a count of zero simply
> because the species wasn’t present there on the survey day (i.e., true zero
> occurring by chance). Given the contradicting information above, and the
> consensus on the importance of considering the source of zeros in model
> selection, I would very much appreciate if someone could clear this up for
> me - or let me know if I'm completely missing something here? Perhaps this
> question should be posed on a stats forum, but given question 2 below, I
> thought I'd try here first.
> 2) Assuming that I’m on the right track with a ZIP, is there a package I
> can use to model a ZIP with a random effect for site? I looked at glmmADMB;
> however, the zero inflation can only be modeled as a constant. This doesn’t
> make sense for my system, as the zero-inflation will be a function of
> habitat covariates (see above). Likewise, glmmPQL is not an option, as this
> method does not yield log-likelihoods (and thus no AIC). I’m also thinking
> that the random effect will have to be included in the zero process as well
> – is this right?
Some of your jargon is unfamiliar to me--"true" and "false" zeros. I
suppose a false zero would be the result of a "hurdle process" (as in
the pscl package).  I've not seen a hurdle model joined in the same
with a zero-inflation model.  Certainly not with "random effects"
apart from the inflated zeros.

Although I do not believe there is an ML solution for your problem
within easy reach. However, there are Bayesian answers. Please see the
package MCMCglmm.  It has a very well done pair of vignettes.

MCMCglmm has a ZIP family option, and you can add random effects.
Jarod Hadfield has been a regular contributor here and I think if you
post your working example code he and others will be glad to help out.


Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

More information about the R-sig-mixed-models mailing list