[R] Zero inflation model - pscl package
Achim.Zeileis at uibk.ac.at
Thu Feb 25 14:59:21 CET 2010
On Wed, 24 Feb 2010, Nicholas M. Caruso wrote:
> I have some questions regarding Zero Inflation Poisson models.
> I am using count data to analyze abundance trends of salamanders. However,
> I have surveys which differ in the amount of effort (i.e. the number of
> people searching and amount of time - I am using a museum database so not
> all surveys were conducted by me). Therefore I need to account for the
> effort. If change the count (response variable) then it will have decimals
> and not be usable in this model. So I decided to put this term into the
> independent variable.
The usual approach would be the following: If you think that some link
function of y/n (response per effort) is linear in a set of covariates x
with coefficients b, you would typically write
log(y/n) = x'b
which can be transformed to
log(y) - log(n) = x'b
log(y) = x'b + log(n)
i.e., the log-effort would be an additional regressor with coefficient
fixed to 1. This is called an offset so the R formula would be
y ~ x + offset(log(n))
Alternatively, instead of relying on the fact the coefficient is exactly
1, you can estimate and test it, i.e.
y ~ x + log(n)
> I am analyzing Historic vs. Current surveys.
> Here is an example of my code:
> model <- zeroinfl(Sallys~Survey:Person.Hours, dist="poisson", EM=TRUE)
I think I would allow different intercepts as well, i.e.,
zeroinfl(Sallys ~ Survey * log(Person.Hours))
> I have received some very significant results on most of them and on some
> that I thought wouldn't be significant turned out to be. So I am concerned
> with the model being appropriate. I created a simulated database and ran a
> simple glm to see if y/b ~ x is the same as y~x:b and it is not (not
> surprisingly). Does anyone have suggestions for how to adjust my model to
> allow for these comparisons? I cannot use a glm with Poisson error because
> of overdispersion and a lot of zeroes. I thought about either rounding up
> my ratios or multiplying everything by 100 to eliminate the decimals but to
> keep the variation (I am not pleased with either of those options)
> On another note, I am having a little trouble interpreting the results (I
> think). Which this may not matter if I cannot use the ZIP model. Is the
> Count model coefficients (poisson with log link) the measure of if the sites
> differ and if so what do the estimates for both surveys indicate? Is that
> the mean for both surveys and it is testing them against zero? If so I want
> to test them against each other and I don't know exactly how to do that.
> Here is the output:
> Estimate Std. Error z value
> (Intercept) 1.97418 0.06570 30.048 <2e-16
> SurveyCurrent:Person.Hours 0.04192 0.07597 0.552 0.581
> SurveyHistoric:Person.Hours 0.40221 0.01540 26.110 <2e-16 ***
It forces the intercept to be the same, both for the current and the
historic sites which is not so intuitive. The two slopes mean, that for
the historic sites, the counts increased clearly with effort, but for the
current sites it increased only slightly (not significantly).
> As for the "Zero-inflation model coefficients( binomial with logit link). I
> read that this is a measure of 1) suitability or 2) if the predictor of
> excess zeros was significant. Which one of these (or is it something else)
> is correct and how do I interpret this?
> Here is a sample of a read out:
> Zero-inflation model coefficients (binomial with logit link):
> Estimate Std. Error z value
> (Intercept) -1.1625 0.9833 -1.182
> SurveyCurrent:Person.Hours -1.1787 1.1304 -1.043 0.297
> SurveyHistoric:Person.Hours -0.5050 0.3440 -1.468 0.142
This reflects the probability of additional zeros which does not seem to
depend on either site or effort.
For an introduction to the zero-inflation model and its implementation in
vignette("countreg", package = "pscl")
Also, I would recommend to consider hurdle() models as well. They often
give similar fits and are slightly easier to interpret (IMO).
> Thanks for any suggestions/help!!
> Nicholas M Caruso
> Graduate Student
> 4219 Biology-Psychology Building
> University of Maryland, College Park, MD 20742-5815
> phone: 301-405-6884
> I learned something of myself in the woods today,
> and walked out pleased for having made the acquaintance.
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help