[R-sig-eco] hurdle model

Thu Aug 19 16:35:00 CEST 2010

Dear All,

I had a quick look at the internal functions used by pscl::hurdle to
do the numerical optimization by optim. It clearly corresponds to the
hurdle model defined in the paper/vignette, where the zero component
is based on a right censored random variable, that is 0 if the
original count data is 0 and 1 otherwise. The likelihood function for
the zero model corresponds to a censored Poisson model. The count
estimation part is based on left truncated Poisson. This is a
conditional inference thinking, the zero model tells us what
determines if the data is 0 or >0, and once the observations are >0,
than what determines the exact count. If estimates from the 2 models
are identical, it means that 0s can arise from the same Poisson
distribution as the counts. So this is not really a mixture as it is
the case with the zeroinfl() model. The resulting log-likelihood is
still valid, and table 1 clearly states that the hurdle model is based
on ML (maximum likelihood).

It is not the same estimating procedure as for the quasipoisson, where
a likelihood-like function is used to get estimates (note that
parameter estimates are the same as for Poisson, but SEs and the
dispaersion parameter are different).

To handle other overdispersion than zero inflation, one can choose NB
instead of Poisson in hurdle. The quasipoisson family is not allowed
there.

Cheers,

Peter

Péter Sólymos
Alberta Biodiversity Monitoring Institute
and Boreal Avian Modelling project
Department of Biological Sciences
CW 405, Biological Sciences Bldg
University of Alberta
Edmonton, Alberta, T6G 2E9, Canada
Phone: 780.492.8534
Fax: 780.492.7635
email <- paste("solymos", "ualberta.ca", sep = "@")
http://www.abmi.ca
http://sites.google.com/site/psolymos

On Thu, Aug 19, 2010 at 7:22 AM, Jari Oksanen <jari.oksanen at oulu.fi> wrote:
> On Thu, 2010-08-19 at 14:54 +0300, Gavin Simpson wrote:
>> On Thu, 2010-08-19 at 13:20 +0200, Yingjie Zhang wrote:
>
>> They fit several models and compare them:
>>
>>      I. Poisson
>>     II. Negative Binomial
>>    III. Quasi-likelihood
>>     IV. Hurdle model
>>      V. zero-inflated model
>>
>> III should be a quasi-poisson model, i.e. you fit the Poisson GLM
>> using
>> quasi-likelihood and model the dispersion parameter \phi alongside the
>> usual Poisson GLM parameters.
>>
>> Section 2.3 of their paper on the hurdle model doesn't even mention
>> "quasi". Though they do mention this in Table2.
>>
>> Reading this, I think they cooked this model themselves - you can fit
>> a
>> binomial model yourself for the presence absence and then fit a count
>> model for the samples predicted to be present from the binomial part.
>> To
>> make things simple I suspect they fitted the count part as
>> quasi-Poisson
>> but no-where does it say exactly what they did.
>
> I know that at least Jane Elith has an email address (I have used it
> years ago), so you could ask her. However, it may be  that their hurdle
> model uses just Poisson, and there is a minor mistake in their Table 2.
>
> You can use quasipoisson() or poisson() in glm() in a very natural way:
> the fitting happens via iteratively reweighted least squares, and all
> you need to define is the relationship between fitted values and
> variance. If you look at poisson() and quasipoisson() functions in R
> (these provide the backbone of the glm(..., family=)), you see that the
> differences are that quasipoissoin()$aic() always returns NA, and
> quasipoisson() lacks item simulate(). Otherwise they work in a similar
> way. Except in poisson() you take the scale (\phi) to be 1, and in
> quasipoisson() you estimate the scale from the fitted model. Then you
> just multiply standard errors with the scale, use F tests instead of
> Chisq in anova() etc.
>
> I am not sure (or actually, I don't think) that this fitting parallelism
> extends to *truncated* Poisson that is used in pscl::hurdle(). Although
> you can do fitting by stages, and fit quasipoisson() glm for above-zero
> values, I don't think this is the correct thing to do when you are not
> allowed to have new zeros. However, the truncated poisson likelihood
> model is a huge improvement over hand-fitting glm with iteratively
> reweighted least squares and assuming constant variance/fit
> relationship.
>
> If you are worried about the overdispersion of the above-zero count
> data, use the truncated negative binomial model offerred by
> pscl::hurdle(). It is designed for the purpose (and has a more exciting
> narrative for ecologists).
>
> Cheers, Jari Oksanen
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>