[R] Zero inflation model - pscl package

Thu Feb 25 14:59:21 CET 2010

On Wed, 24 Feb 2010, Nicholas M. Caruso wrote:

> I have some questions regarding Zero Inflation Poisson models.
>
> I am using count data to analyze abundance trends of salamanders.  However,
> I have surveys which differ in the amount of effort (i.e. the number of
> people searching and amount of time - I am using a museum database so not
> all surveys were conducted by me).  Therefore I need to account for the
> effort.  If change the count (response variable) then it will have decimals
> and not be usable in this model.  So I decided to put this term into the
> independent variable.

The usual approach would be the following: If you think that some link 
function of y/n (response per effort) is linear in a set of covariates x 
with coefficients b, you would typically write

   log(y/n) = x'b

which can be transformed to

   log(y) - log(n) = x'b
   log(y)          = x'b + log(n)

i.e., the log-effort would be an additional regressor with coefficient 
fixed to 1. This is called an offset so the R formula would be

   y ~ x + offset(log(n))

Alternatively, instead of relying on the fact the coefficient is exactly 
1, you can estimate and test it, i.e.

   y ~ x + log(n)

> I am analyzing Historic vs. Current surveys.
>
> Here is an example of my code:
> require(pscl)
> model <- zeroinfl(Sallys~Survey:Person.Hours, dist="poisson", EM=TRUE)
> summary(model)

I think I would allow different intercepts as well, i.e.,

   zeroinfl(Sallys ~ Survey * log(Person.Hours))

> I have received some very significant results on most of them and on some
> that I thought wouldn't be significant turned out to be.  So I am concerned
> with the model being appropriate.  I created a simulated database and ran a
> simple glm to see if y/b ~ x is the same as y~x:b and it is not (not
> surprisingly).  Does anyone have suggestions for how to adjust my model to
> allow for these comparisons?  I cannot use a glm with Poisson error because
> of overdispersion and a lot of zeroes.  I thought about either rounding up
> my ratios or multiplying everything by 100 to eliminate the decimals but to
> keep the variation (I am not pleased with either of those options)
>
> On another note, I am having a little trouble interpreting the results (I
> think).  Which this may not matter if I cannot use the ZIP model.  Is the
> Count model coefficients (poisson with log link) the measure of if the sites
> differ and if so what do the estimates for both surveys indicate?  Is that
> the mean for both surveys and it is testing them against zero?  If so I want
> to test them against each other and I don't know exactly how to do that.
> Here is the output:
>                                            Estimate Std. Error z value
> Pr(>|z|)
> (Intercept)                             1.97418    0.06570  30.048   <2e-16
> ***
> SurveyCurrent:Person.Hours   0.04192    0.07597   0.552    0.581
> SurveyHistoric:Person.Hours  0.40221    0.01540  26.110   <2e-16 ***

It forces the intercept to be the same, both for the current and the 
historic sites which is not so intuitive. The two slopes mean, that for 
the historic sites, the counts increased clearly with effort, but for the 
current sites it increased only slightly (not significantly).

> As for the "Zero-inflation model coefficients( binomial with logit link).  I
> read that this is a measure of 1) suitability or 2) if the predictor of
> excess zeros was significant.  Which one of these (or is it something else)
> is correct and how do I interpret this?
>
> Here is a sample of a read out:
>
> Zero-inflation model coefficients (binomial with logit link):
>                                            Estimate Std. Error z value
> Pr(>|z|)
> (Intercept)                               -1.1625     0.9833  -1.182
> 0.237
> SurveyCurrent:Person.Hours   -1.1787     1.1304  -1.043    0.297
> SurveyHistoric:Person.Hours  -0.5050     0.3440  -1.468    0.142

This reflects the probability of additional zeros which does not seem to 
depend on either site or effort.

For an introduction to the zero-inflation model and its implementation in 
R see
   vignette("countreg", package = "pscl")

Also, I would recommend to consider hurdle() models as well. They often 
give similar fits and are slightly easier to interpret (IMO).

hth,
Z

> <http://search.twitter.com/search?q=%0D%0A><http://www.google.com/search?q=%0D%0A><http://smarterfox.com/wikisearch/search?q=%0D%0A&locale=en-US><http://www.oneriot.com/search?p=smarterfox&ssrc=smarterfox_popup_bubble&spid=8493c8f1-0b5b-4116-99fd-f0bcb0a3b602&q=%0D%0A>
>
> Thanks for any suggestions/help!!
>
> -- 
> Nicholas M Caruso
> Graduate Student
> CLFS-Biology
> 4219 Biology-Psychology Building
> University of Maryland, College Park, MD 20742-5815
> phone: 301-405-6884
>
>
>
> ------------------------------------------------------------------
> I learned something of myself in the woods today,
> and walked out pleased for having made the acquaintance.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>