[R-sig-eco] hurdle model

Gavin Simpson gavin.simpson at ucl.ac.uk
Thu Aug 19 13:54:01 CEST 2010


On Thu, 2010-08-19 at 13:20 +0200, Yingjie Zhang wrote:
> Thanks for the details, the paper is 'Comparing species abundance
> models' by Joanne M.Potts, Jane Elith.  Click the link... on page 158,
> in the table, they compare 5 models, both Quasi-likelihood and Hurdle
> are mentioned.
> 
> http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VBS-4KD5C2N-1&_user=794998&_coverDate=11%2F16%2F2006&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1435498227&_rerunOrigin=google&_acct=C000043466&_version=1&_urlVersion=0&_userid=794998&md5=fc0c4ebc77917948c90f8f0ee3bbe141
> 
> Maybe we went too far, when I read the paper above, I just thought it
> would be interesting to try the method they mentioned. My data have
> both characteristics: excess 0s and over dispersion of  positive part.
> And I am quite  convinced that the 0s have a single source ... that's
> why I didn't use ZIP/ZINB. 
> 
> Maybe for the excess 0s, over-dispersion and one source of 0s, the
> best model is Hurdle with truncated negative binomial, but my motive
> is to make sure that which ML method that Hurdle use.

They fit several models and compare them:

     I. Poisson
    II. Negative Binomial
   III. Quasi-likelihood
    IV. Hurdle model
     V. zero-inflated model

III should be a quasi-poisson model, i.e. you fit the Poisson GLM using
quasi-likelihood and model the dispersion parameter \phi alongside the
usual Poisson GLM parameters.

Section 2.3 of their paper on the hurdle model doesn't even mention
"quasi". Though they do mention this in Table2.

Reading this, I think they cooked this model themselves - you can fit a
binomial model yourself for the presence absence and then fit a count
model for the samples predicted to be present from the binomial part. To
make things simple I suspect they fitted the count part as quasi-Poisson
but no-where does it say exactly what they did.

You would be better off fitting the hurdle as I mentioned using hurdle()
in pscl; fitting things using quasi-likelihood is just asking for
trouble if there are proper likelihood options available.

Read the vignette that accompanies the pscl package for details of how
it fits the various models including the hurdle. This includes the
likelihood functions that are optimised as part of the fitting.

HTH

G

> 
> Cheers,
> Yingjie  
> 
> On 19 Aug 2010, at 11:49, Gavin Simpson wrote:
> 
> > On Thu, 2010-08-19 at 11:14 +0200, Yingjie Zhang wrote:
> >> Hi,
> >> 
> >> There is a reason why am I addict to Quasi likelihood, since Hurdle
> >> from 'pscl' use Zero Truncated Poisson regression for the non-zero
> >> part, which incapable of handling the over-disperson comes from the
> >> positive part of the data. Apparently, Quasi likelihood is at least a
> >> better choice. I've noticed the hurdle they used for the paper comes
> >> from package 'stats' instead of 'pscl', I didn't find this version of
> >> hurdle in r...
> > 
> > Quasi-likelihood isn't solving the "over-dispersion comes from positive
> > part". It is a means of fitting models, just like maximum likelihood
> > etc. It will be the authors model that does the accounting for over
> > dispersion. They solve the parameters of this model using
> > quasi-likelihood.
> > 
> > Your claim about hurdle in stats is incorrect:
> > 
> >> getAnywhere(hurdle)
> > no object named ‘hurdle’ was found
> >> getAnywhere("hurdle")
> > no object named ‘hurdle’ was found
> > 
> > So they must be using something else. Here's a thought; why not give us
> > the reference/citation for the paper you are reading --- it is difficult
> > to speculate further without more details like the actual paper?
> > 
> > Hurdle models fit a point mass at zero, whilst the count part of the
> > model is truncated to not allow any further zeros be produced from it.
> > 
> > A zeroinflated (zeroinfl() in pscl) model fits a point mass at zero and
> > has an untruncated count model which will allow extra zeros be produced.
> > 
> > In both cases a negative binomial model may be fitted to the count part,
> > which may be sufficient to cope with remaining overdispersion in the
> > count part of your model.
> > 
> > I think you would be better off thinking where the overdispersion is
> > coming from and choosing an appropriate means to model it. You are being
> > blinded by this talk of quasi-likelihoods. There may well be a way of
> > fitting the model you want in R without resorting to quasi-likelihood
> > tricks. But as you haven't told us what model you want to fit or a
> > citation for the paper you want to replicate, there isn't much further
> > we can do.
> > 
> > HTH
> > 
> > G
> > 
> >> 
> >> On 19 Aug 2010, at 10:55, Gavin Simpson wrote:
> >> 
> >>> On Thu, 2010-08-19 at 10:30 +0200, Yingjie Zhang wrote:
> >>>> I'd like to try the same way to my dataset, hurdle but estimated by
> >>>> 'quasi-likelihood', but it's not in the standard 'pscl' package I
> >>>> think, right?
> >>> 
> >>> Please keep discussion on list; just because I replied doesn't give you
> >>> a direct line to my inbox...
> >>> 
> >>> Why would you want a quasi-likelihood when you could have the real
> >>> thing? Seriously, if there is no likelihood you can't do likelihood
> >>> ratio tests, compare models using AIC/BIC etc.
> >>> 
> >>> Just use hurdle() if it fits the form of model you are after. don't
> >>> worry about likelihoods, quasi or otherwise. Check you are happy with
> >>> the range of models you could fit with hurdle() and use it. If you
> >>> aren't happy then you'd need to look elsewhere, but don't get hung up on
> >>> the quasi-likelihood bit.
> >>> 
> >>> My Tuppence,
> >>> 
> >>> G
> >>> 
> >>>> 
> >>>> 
> >>>> On 19 Aug 2010, at 10:17, Gavin Simpson wrote:
> >>>> 
> >>>>> On Thu, 2010-08-19 at 09:52 +0200, Yingjie Zhang wrote:
> >>>>>> Hello everyone,
> >>>>>> 
> >>>>>> Does anyone of you using hurdle model? I am reading a paper which said
> >>>>>> " Hurdle model removes effect of zero-inflation and over-dispersion in
> >>>>>> the non-zero observations using a quasi-likelihood", I've checked the
> >>>>>> help file from hurdle in R, which said differently that"for non-zero
> >>>>>> obs normally a truncated poisson/NB is used" ... just want to make
> >>>>>> sure, does it really estimated the parameter by "quasi-likelihood"
> >>>>>> 
> >>>>>> Thanks,
> >>>>>> Yingjie Zhang
> >>>>>> Biostatistician
> >>>>> 
> >>>>> The authors of that paper might have fitted their hurdle model using a
> >>>>> quasi likelihood but that is not, AFAICT, what is used in the hurdle()
> >>>>> function in package 'pscl', which maximises a proper log likelihood.
> >>>>> 
> >>>>> But hard to say from what you have provided.
> >>>>> 
> >>>>> G
> >>>>> -- 
> >>>>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >>>>> Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> >>>>> ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> >>>>> Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> >>>>> Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> >>>>> UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> >>>>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >>>>> 
> >>>> 
> >>> 
> >>> -- 
> >>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >>> Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> >>> ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> >>> Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> >>> Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> >>> UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> >>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >>> 
> >> 
> > 
> > -- 
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> > ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> > Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> > Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> > UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > 
> 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-sig-ecology mailing list