[R] Problem with zero-inflated negative binomial model in sediment river dynamics

Achim Zeileis Achim.Zeileis at uibk.ac.at
Wed Aug 14 21:32:18 CEST 2013


On Wed, 14 Aug 2013, Cade, Brian wrote:

> Z is correct, of course.  I was just being a little too simplistic in my 
> explanation trying to emphasize the reversal of signs of the 
> coefficients in the logistic regression part of the zero-inflated model.

When users ask me what the binary part of the two types of count models 
mean, I always say:

- In the zero-inflation model, the binary model predicts the probability 
of _zero inflation_ (= excess zeros).

- In the hurdle model, the binary model predicts the probability for 
_hurdle crossing_ (= non-zero response).

To me this always seemed natural, even if the sign reversal in the 
zero-inflation model may be surprising at first sight...

hth,
Z

> Brian  
> 
> Brian S. Cade, PhD
> 
> U. S. Geological Survey
> Fort Collins Science Center
> 2150 Centre Ave., Bldg. C
> Fort Collins, CO  80526-8818
> 
> email:  cadeb at usgs.gov
> tel:  970 226-9326
> 
> 
> 
> On Wed, Aug 14, 2013 at 4:07 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at>
> wrote:
>       On Tue, 13 Aug 2013, Cade, Brian wrote:
>
>             Lauria:  For historical reasons the logistic
>             regression (binomial with
>             logit link) model portion of a zero-inflated count
>             model is usually
>             structured to predict the probability of the 0
>             counts rather than the
>             nonzero (>=1) counts so the coefficients will be the
>             negative of what you
>             expect based on the count model portion (as in your
>             output).  It is simple
>             to interpret the probability of the logistic
>             regression portion as the
>             probability of the nonzero counts by just taking the
>             negative of the
>             coefficient estimates provided for the probability
>             of the zero counts.
> 
>
>       This is a common misinterpretation but not quite correct.
>
>       The zero-inflation model is a mixture model of two components:
>       (1) a count component (Poisson, NB, ...), and (2) a zero mass
>       component (i.e., zero with probability 1). Hence, the observed
>       zeros in the data can come from both sources: either they are
>       "random" zeros from component (1) or "excess" zeros from
>       component (2).
>
>       The binomial zero-inflation part of the model predicts the
>       probability that a given observation belongs to component (1).
>       Thus, the probability of an "excess zero". But this is _not_ the
>       probability of observing a zero in the data (which is larger
>       than the excess zero probability).
>
>       If you want a model that first models zero vs. non-zero and
>       second the non-zero counts, use the hurdle model. This has
>       exactly the interpretation you describe above.
>
>       Best,
>       Z
>
>             Brian
>
>             Brian S. Cade, PhD
>
>             U. S. Geological Survey
>             Fort Collins Science Center
>             2150 Centre Ave., Bldg. C
>             Fort Collins, CO  80526-8818
>
>             email:  cadeb at usgs.gov <brian_cade at usgs.gov>
>             tel:  970 226-9326
> 
> 
>
>             On Tue, Aug 13, 2013 at 9:06 AM, Lauria, Valentina <
>             valentina.lauria at nuigalway.ie> wrote:
>
>                   Dear All,
>
>                   I am running a negative binomial model
>                   in R using the package pscl in oder
>                   to estimate bed sediment movements
>                   versus river discharge. Currently we
>                   have deployed 4 different plates to test
>                   if a combination of more than one
>                   plate would better describe the sediment
>                   movements when the river discharge
>                   changes over time.
>
>                   My data are positively skewed and
>                   zero-inflated. I did run both
>                   zero-inflated Poisson and zero-inflated
>                   negative binomial regression and
>                   compared them using the VUONG test which
>                   showed that the negative binomial
>                   works better than a simple zero-inflated
>                   Poisson.
>
>                   My models look like:
> 
>
>                   1) plate1 ~ river discharge
>                   2) (plate 1 + plate 2) ~ river discharge
>                   3) (plate 1 + plate 2 +plate 3) ~ river
>                   discharge
>                   4) (plate 1 + plate 2 + plate 3 + plate
>                   4) ~ river discharge
> 
>
>                   My main problem as I am new to these
>                   type of models is that I get a
>                   different sign for the coefficent of
>                   discharge in the output of the
>                   zero-inflated negative binomial model
>                   (please see below). What does this
>                   mean? Also how could I compare the
>                   different models (1-4) i.e. what tells
>                   me which is performing best? Thank you
>                   very much in advance for any
>                   comments and suggestions!!
>
>                   Kind Regards,
>                   Valentina
> 
>
>                   Call:
>                   zeroinfl(formula = plate1 ~ discharge,
>                   data = datafit_plates, dist =
>                   "negbin", EM = TRUE)
>                   Pearson residuals:
>                       Min      1Q  Median      3Q     Max
>                   -0.6770 -0.3564 -0.2101 -0.0814 12.3421
>
>                   Count model coefficients (negbin with
>                   log link):
>                                            Estimate  
>                    Std. Error z value Pr(>|z|)
>                   (Intercept)  2.557066     0.036593  
>                   69.88   <2e-16 ***
>                   discharge    0.064698    0.001983  
>                   32.63   <2e-16 ***
>                   Log(theta)  -0.775736   0.012451  -62.30
>                     <2e-16 ***
>
>                   Zero-inflation model coefficients
>                   (binomial with logit link):
>                                         Estimate    Std.
>                   Error     z value    Pr(>|z|)
>                   (Intercept)   13.01011    0.22602    
>                    57.56   <2e-16 ***
>                   discharge    -1.64293    0.03092      
>                   -53.14   <2e-16 ***
>                   Theta = 0.4604
>                   Number of iterations in BFGS
>                   optimization: 1
>                   Log-likelihood: -6.933e+04 on 5 Df
> 
> 
> 
> 
> 
>
>                           [[alternative HTML version
>                   deleted]]
>
>                   ______________________________________________
>                   R-help at r-project.org mailing list
>                   https://stat.ethz.ch/mailman/listinfo/r-help
>                   PLEASE do read the posting guide
>                   http://www.R-project.org/posting-guide.html
>                   and provide commented, minimal,
>                   self-contained, reproducible code.
> 
>
>                     [[alternative HTML version deleted]]
>
>             ______________________________________________
>             R-help at r-project.org mailing list
>             https://stat.ethz.ch/mailman/listinfo/r-help
>             PLEASE do read the posting guide
>             http://www.R-project.org/posting-guide.html
>             and provide commented, minimal, self-contained,
>             reproducible code.
> 
> 
> 
>


More information about the R-help mailing list