[R-sig-ME] Zero-inflated mixed effects model - clarification of zeros modeled and R package questions

Paul Johnson pauljohn32 at gmail.com
Fri Jun 22 05:43:53 CEST 2012


Dear Jennifer:
Response below

On Wed, Jun 20, 2012 at 5:32 PM, Jennifer Barrett
<jenn.s.barrett at gmail.com> wrote:
> Hi folks,
>
>
> I’m looking for some guidance in regards to zero-inflated models with
> repeated measures (i.e., random effect for site). My first question is more
> of a statistical one, while the second is related to R packages. Apologies
> for the long post; however, I want to make sure my concerns/questions are
> clear!
>
>
> Our project and dataset:
>
>
> - The aim of our project is to 1) examine associations between shoreline
> habitat characteristics and the abundance of several shorebird species; and
> 2) estimate the total abundance of each shorebird species within the entire
> study region based on the models from 1) above, with confidence intervals.
> Note that we will be using an information theoretic approach for 1) above,
> and would like to use MMI for 2).
>
> - Our response dataset consists of counts of shorebirds at >150 coastal
> sites, conducted on the second Sunday of each month between the months of
> Oct-March, over 10 years; however, not every site was surveyed in all
> months (we’ve limited our dataset to those with a minimum of 3 counts in a
> year).  Our response variable is thus the number of birds counted in a
> given month/year at a given site. Note that we plan to model each year
> separately.
>
> -  The habitat dataset consists of shoreline units within our entire study
> region, with each unit characterized by exposure, substrate type...etc.
> Using GIS, we’ve measured the length of shoreline belonging to shoreline
> categories (e.g., sand, rock, mud) within each survey site, the average
> exposure for the site, and other continuous attributes, as well as one
> presence/absence covariate.
>
> - Initial exploratory analysis has shown that the counts are zero-inflated.
> While there may be some false zeros in our dataset (i.e., observer error),
> the source of the zero-inflation is likely preference of shorebirds for
> particular sites with particular features and avoidance of others (i.e.,
> true zeros or “structural zeros”). Some zeros likely also arise because the
> species does not saturate its habitat (i.e., habitat suitable, but
> unoccupied – also a “true” zero), though again, the majority of the zeros
> are likely structural.
>
>
> Onto my questions:
>
>
> 1) I’ve been reading through the literature to decide what type of model
> would best be suited for our dataset and questions. While all articles seem
> to agree that the choice of a model needs to consider the source of excess
> zeros, they seem to contradict one another in regards to what zeros are
> being modeled in each component of a zero-inflated mixture model. Note that
> I am not considering a two-part (i.e., conditional) model, because I do not
> believe that all zeros arise from the occupancy process (as per Joseph et
> al. 2009 and as noted above, zero abundance can occur by chance in our
> system). Examples:
>
>
> - Martin et al. (2005) state that when zero inflation is due to true zeros,
> two-part or mixture models (ZIP or ZINB) are recommended, and that when
> zero inflation is due to false zeros, a ZIB mixture model is recommended;
> however, when zero inflation is due to both excess true and false zeros, a
> Bayesian framework may be used, though there is no formal discussion in the
> literature. NOTE: Since this article was published, Royle’s N-mixture model
> has addressed this issue; however, I cannot use this approach as my data do
> not meet the assumption of a closed population during the study period.
>
> - In contrast to Martin et al. (2005), Potts and Elith (2006) state that
> the zero-inflated mixture model structure implies that zero observations
> arising from the zero process are true negative observations, and that
> those arising from the Poisson process are false negative observations “that
> is, the habitat is suitable, but unoccupied” (p.155). However, on the
> previous page, they defined false negative as “attributable to experimental
> design… or observer error”, and habitat that is “suitable, but unoccupied”
> as a true negative, so I'm not sure which type of zero observation they are
> really referring to here for the Poisson process.
>
> - In contrast to both sources above, Zuur et al. (2009) state that in a ZIP
> or ZINB, zeros are modeled as coming from two processes – the binomial
> process, which models only false zeros (observer, design, and survey error)
> and the Poisson (or Negbin) process  which models the true zeros and
> counts. This is the opposite of what was stated by Potts and Elith.
>
> - Finally, I’ve read other sources which state that ZIPs simply treat the
> population as a mixture, with one set of subjects having a zero response –
> in other words, there is no mention of whether the zero process is modeling
> the “true” or “false” zeros.
>
>
> Thinking about my system: there are a bunch of sites where the birds (of a
> given species) never go (habitat is unsuitable), and a bunch where they do
> go with varying levels of abundance (habitat is suitable, but come sites
> are more favored than others, based on habitat features). Following the
> last bullet above, a site that is suitable may have a count of zero simply
> because the species wasn’t present there on the survey day (i.e., true zero
> occurring by chance). Given the contradicting information above, and the
> consensus on the importance of considering the source of zeros in model
> selection, I would very much appreciate if someone could clear this up for
> me - or let me know if I'm completely missing something here? Perhaps this
> question should be posed on a stats forum, but given question 2 below, I
> thought I'd try here first.
>
>
> 2) Assuming that I’m on the right track with a ZIP, is there a package I
> can use to model a ZIP with a random effect for site? I looked at glmmADMB;
> however, the zero inflation can only be modeled as a constant. This doesn’t
> make sense for my system, as the zero-inflation will be a function of
> habitat covariates (see above). Likewise, glmmPQL is not an option, as this
> method does not yield log-likelihoods (and thus no AIC). I’m also thinking
> that the random effect will have to be included in the zero process as well
> – is this right?
>
Some of your jargon is unfamiliar to me--"true" and "false" zeros. I
suppose a false zero would be the result of a "hurdle process" (as in
the pscl package).  I've not seen a hurdle model joined in the same
with a zero-inflation model.  Certainly not with "random effects"
apart from the inflated zeros.

Although I do not believe there is an ML solution for your problem
within easy reach. However, there are Bayesian answers. Please see the
package MCMCglmm.  It has a very well done pair of vignettes.

MCMCglmm has a ZIP family option, and you can add random effects.
Jarod Hadfield has been a regular contributor here and I think if you
post your working example code he and others will be glad to help out.

pj



-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu



More information about the R-sig-mixed-models mailing list