[R] Zero-inflated regression models: predicting no 0s

Jean-Simon Michaud michaud2 at interchange.ubc.ca
Wed Jun 1 02:39:47 CEST 2011


Hi all, 

 

First post for me here, but I have been reading on the forum for almost two
years now. Thanks to everyone who contributed btw!

 

I have a dataset of 4000 observations of count of a mammal and I am trying
to predict abundance from a inflated-zero model as there is quite a bit of
zeros in the response variable. 

I have tried multiple options, but I might do something wrong as every time
I look at the fitted values it do not comprise any 0. 

 

Here is what I tried so far: 

 

"

## - hurdle from the package (lpsc) - ##

 

> hurdle1 = hurdle(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
mydata_purge2$LC231 + mydata_purge2$DEM, data = food, dist = "negbin",
zero.dist = "binomial")

> summary(hurdle1)

 

Call:

hurdle(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
mydata_purge2$LC231 + mydata_purge2$DEM, data = food, 

    dist = "negbin", zero.dist = "binomial")

 

Pearson residuals:

    Min      1Q  Median      3Q     Max 

-1.0833 -0.7448 -0.2801  0.4296  6.7242 

 

Count model coefficients (truncated negbin with log link):

                      Estimate Std. Error z value Pr(>|z|)    

(Intercept)          1.7841678  0.0923781  19.314  < 2e-16 ***

mydata_purge2$LC80  -2.5929984  0.4184956  -6.196 5.79e-10 ***

mydata_purge2$LC231  0.2154269  0.1171259   1.839 0.065875 .  

mydata_purge2$DEM    0.0007708  0.0002064   3.735 0.000188 ***

Log(theta)           0.3742602  0.0390319   9.589  < 2e-16 ***

Zero hurdle model coefficients (binomial with logit link):

                      Estimate Std. Error z value Pr(>|z|)    

(Intercept)          0.0602347  0.2302370   0.262 0.793614    

mydata_purge2$LC80  -3.0590108  0.8360020  -3.659 0.000253 ***

mydata_purge2$LC231  1.7754441  0.3226731   5.502 3.75e-08 ***

mydata_purge2$DEM    0.0031943  0.0005307   6.020 1.75e-09 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

 

Theta: count = 1.4539

Number of iterations in BFGS optimization: 12 

Log-likelihood: -1.251e+04 on 9 Df

 

 

## - zeroinfl from the package (lpsc) - ##

 

> zip1A = zeroinfl(mydata_purge2$TOT ~ mydata_purge2$LC80 +
mydata_purge2$LC231 + mydata_purge2$DEM, data = food)

 

> summary(zip1A)

 

Call:

zeroinfl(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
mydata_purge2$LC231 + mydata_purge2$DEM, data = food)

 

Pearson residuals:

    Min      1Q  Median      3Q     Max 

-2.2128 -1.2886 -0.5010  0.7594 11.8458 

 

Count model coefficients (poisson with log link):

                      Estimate Std. Error z value Pr(>|z|)    

(Intercept)          1.894e+00  3.547e-02  53.401  < 2e-16 ***

mydata_purge2$LC80  -2.249e+00  1.768e-01 -12.725  < 2e-16 ***

mydata_purge2$LC231  1.799e-01  4.492e-02   4.005 6.21e-05 ***

mydata_purge2$DEM    6.670e-04  7.687e-05   8.678  < 2e-16 ***

 

Zero-inflation model coefficients (binomial with logit link):

                      Estimate Std. Error z value Pr(>|z|)    

(Intercept)         -0.0593751  0.2308068  -0.257 0.796986    

mydata_purge2$LC80   2.9428092  0.8523669   3.453 0.000555 ***

mydata_purge2$LC231 -1.7772101  0.3233166  -5.497 3.87e-08 ***

mydata_purge2$DEM   -0.0031901  0.0005319  -5.997 2.01e-09 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

 

Number of iterations in BFGS optimization: 13 

Log-likelihood: -1.727e+04 on 8 Df

 

> a1 = predict(zip1A)

> b1 = mydata_purge2$TOT

> plot(a1,b1)

 

"

 

Please find attached the plot of zip1A (which look quite similar to the
hurdle1).

 

Your help would be much appreciated, 

 

Thanks, 

 

JM.

 



More information about the R-help mailing list