[R-sig-eco] hurdle model

Thu Aug 19 15:22:00 CEST 2010

On Thu, 2010-08-19 at 14:54 +0300, Gavin Simpson wrote:
> On Thu, 2010-08-19 at 13:20 +0200, Yingjie Zhang wrote:

> They fit several models and compare them:
> 
>      I. Poisson
>     II. Negative Binomial
>    III. Quasi-likelihood
>     IV. Hurdle model
>      V. zero-inflated model
> 
> III should be a quasi-poisson model, i.e. you fit the Poisson GLM
> using
> quasi-likelihood and model the dispersion parameter \phi alongside the
> usual Poisson GLM parameters.
> 
> Section 2.3 of their paper on the hurdle model doesn't even mention
> "quasi". Though they do mention this in Table2.
> 
> Reading this, I think they cooked this model themselves - you can fit
> a
> binomial model yourself for the presence absence and then fit a count
> model for the samples predicted to be present from the binomial part.
> To
> make things simple I suspect they fitted the count part as
> quasi-Poisson
> but no-where does it say exactly what they did.

I know that at least Jane Elith has an email address (I have used it
years ago), so you could ask her. However, it may be  that their hurdle
model uses just Poisson, and there is a minor mistake in their Table 2. 

You can use quasipoisson() or poisson() in glm() in a very natural way:
the fitting happens via iteratively reweighted least squares, and all
you need to define is the relationship between fitted values and
variance. If you look at poisson() and quasipoisson() functions in R
(these provide the backbone of the glm(..., family=)), you see that the
differences are that quasipoissoin()$aic() always returns NA, and
quasipoisson() lacks item simulate(). Otherwise they work in a similar
way. Except in poisson() you take the scale (\phi) to be 1, and in
quasipoisson() you estimate the scale from the fitted model. Then you
just multiply standard errors with the scale, use F tests instead of
Chisq in anova() etc.

I am not sure (or actually, I don't think) that this fitting parallelism
extends to *truncated* Poisson that is used in pscl::hurdle(). Although
you can do fitting by stages, and fit quasipoisson() glm for above-zero
values, I don't think this is the correct thing to do when you are not
allowed to have new zeros. However, the truncated poisson likelihood
model is a huge improvement over hand-fitting glm with iteratively
reweighted least squares and assuming constant variance/fit
relationship.

If you are worried about the overdispersion of the above-zero count
data, use the truncated negative binomial model offerred by
pscl::hurdle(). It is designed for the purpose (and has a more exciting
narrative for ecologists).

Cheers, Jari Oksanen