[R] Problem with zero-inflated negative binomial model in sediment river dynamics
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Wed Aug 14 21:32:18 CEST 2013
On Wed, 14 Aug 2013, Cade, Brian wrote:
> Z is correct, of course. I was just being a little too simplistic in my
> explanation trying to emphasize the reversal of signs of the
> coefficients in the logistic regression part of the zero-inflated model.
When users ask me what the binary part of the two types of count models
mean, I always say:
- In the zero-inflation model, the binary model predicts the probability
of _zero inflation_ (= excess zeros).
- In the hurdle model, the binary model predicts the probability for
_hurdle crossing_ (= non-zero response).
To me this always seemed natural, even if the sign reversal in the
zero-inflation model may be surprising at first sight...
hth,
Z
> Brian
>
> Brian S. Cade, PhD
>
> U. S. Geological Survey
> Fort Collins Science Center
> 2150 Centre Ave., Bldg. C
> Fort Collins, CO 80526-8818
>
> email: cadeb at usgs.gov
> tel: 970 226-9326
>
>
>
> On Wed, Aug 14, 2013 at 4:07 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at>
> wrote:
> On Tue, 13 Aug 2013, Cade, Brian wrote:
>
> Lauria: For historical reasons the logistic
> regression (binomial with
> logit link) model portion of a zero-inflated count
> model is usually
> structured to predict the probability of the 0
> counts rather than the
> nonzero (>=1) counts so the coefficients will be the
> negative of what you
> expect based on the count model portion (as in your
> output). It is simple
> to interpret the probability of the logistic
> regression portion as the
> probability of the nonzero counts by just taking the
> negative of the
> coefficient estimates provided for the probability
> of the zero counts.
>
>
> This is a common misinterpretation but not quite correct.
>
> The zero-inflation model is a mixture model of two components:
> (1) a count component (Poisson, NB, ...), and (2) a zero mass
> component (i.e., zero with probability 1). Hence, the observed
> zeros in the data can come from both sources: either they are
> "random" zeros from component (1) or "excess" zeros from
> component (2).
>
> The binomial zero-inflation part of the model predicts the
> probability that a given observation belongs to component (1).
> Thus, the probability of an "excess zero". But this is _not_ the
> probability of observing a zero in the data (which is larger
> than the excess zero probability).
>
> If you want a model that first models zero vs. non-zero and
> second the non-zero counts, use the hurdle model. This has
> exactly the interpretation you describe above.
>
> Best,
> Z
>
> Brian
>
> Brian S. Cade, PhD
>
> U. S. Geological Survey
> Fort Collins Science Center
> 2150 Centre Ave., Bldg. C
> Fort Collins, CO 80526-8818
>
> email: cadeb at usgs.gov <brian_cade at usgs.gov>
> tel: 970 226-9326
>
>
>
> On Tue, Aug 13, 2013 at 9:06 AM, Lauria, Valentina <
> valentina.lauria at nuigalway.ie> wrote:
>
> Dear All,
>
> I am running a negative binomial model
> in R using the package pscl in oder
> to estimate bed sediment movements
> versus river discharge. Currently we
> have deployed 4 different plates to test
> if a combination of more than one
> plate would better describe the sediment
> movements when the river discharge
> changes over time.
>
> My data are positively skewed and
> zero-inflated. I did run both
> zero-inflated Poisson and zero-inflated
> negative binomial regression and
> compared them using the VUONG test which
> showed that the negative binomial
> works better than a simple zero-inflated
> Poisson.
>
> My models look like:
>
>
> 1) plate1 ~ river discharge
> 2) (plate 1 + plate 2) ~ river discharge
> 3) (plate 1 + plate 2 +plate 3) ~ river
> discharge
> 4) (plate 1 + plate 2 + plate 3 + plate
> 4) ~ river discharge
>
>
> My main problem as I am new to these
> type of models is that I get a
> different sign for the coefficent of
> discharge in the output of the
> zero-inflated negative binomial model
> (please see below). What does this
> mean? Also how could I compare the
> different models (1-4) i.e. what tells
> me which is performing best? Thank you
> very much in advance for any
> comments and suggestions!!
>
> Kind Regards,
> Valentina
>
>
> Call:
> zeroinfl(formula = plate1 ~ discharge,
> data = datafit_plates, dist =
> "negbin", EM = TRUE)
> Pearson residuals:
> Min 1Q Median 3Q Max
> -0.6770 -0.3564 -0.2101 -0.0814 12.3421
>
> Count model coefficients (negbin with
> log link):
> Estimate
> Std. Error z value Pr(>|z|)
> (Intercept) 2.557066 0.036593
> 69.88 <2e-16 ***
> discharge 0.064698 0.001983
> 32.63 <2e-16 ***
> Log(theta) -0.775736 0.012451 -62.30
> <2e-16 ***
>
> Zero-inflation model coefficients
> (binomial with logit link):
> Estimate Std.
> Error z value Pr(>|z|)
> (Intercept) 13.01011 0.22602
> 57.56 <2e-16 ***
> discharge -1.64293 0.03092
> -53.14 <2e-16 ***
> Theta = 0.4604
> Number of iterations in BFGS
> optimization: 1
> Log-likelihood: -6.933e+04 on 5 Df
>
>
>
>
>
>
> [[alternative HTML version
> deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal,
> self-contained, reproducible code.
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>
>
>
>
More information about the R-help
mailing list