[R] Zero-inflated regression models: predicting no 0s
Jean-Simon Michaud
michaud2 at interchange.ubc.ca
Wed Jun 1 02:39:47 CEST 2011
Hi all,
First post for me here, but I have been reading on the forum for almost two
years now. Thanks to everyone who contributed btw!
I have a dataset of 4000 observations of count of a mammal and I am trying
to predict abundance from a inflated-zero model as there is quite a bit of
zeros in the response variable.
I have tried multiple options, but I might do something wrong as every time
I look at the fitted values it do not comprise any 0.
Here is what I tried so far:
"
## - hurdle from the package (lpsc) - ##
> hurdle1 = hurdle(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
mydata_purge2$LC231 + mydata_purge2$DEM, data = food, dist = "negbin",
zero.dist = "binomial")
> summary(hurdle1)
Call:
hurdle(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
mydata_purge2$LC231 + mydata_purge2$DEM, data = food,
dist = "negbin", zero.dist = "binomial")
Pearson residuals:
Min 1Q Median 3Q Max
-1.0833 -0.7448 -0.2801 0.4296 6.7242
Count model coefficients (truncated negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.7841678 0.0923781 19.314 < 2e-16 ***
mydata_purge2$LC80 -2.5929984 0.4184956 -6.196 5.79e-10 ***
mydata_purge2$LC231 0.2154269 0.1171259 1.839 0.065875 .
mydata_purge2$DEM 0.0007708 0.0002064 3.735 0.000188 ***
Log(theta) 0.3742602 0.0390319 9.589 < 2e-16 ***
Zero hurdle model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.0602347 0.2302370 0.262 0.793614
mydata_purge2$LC80 -3.0590108 0.8360020 -3.659 0.000253 ***
mydata_purge2$LC231 1.7754441 0.3226731 5.502 3.75e-08 ***
mydata_purge2$DEM 0.0031943 0.0005307 6.020 1.75e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta: count = 1.4539
Number of iterations in BFGS optimization: 12
Log-likelihood: -1.251e+04 on 9 Df
## - zeroinfl from the package (lpsc) - ##
> zip1A = zeroinfl(mydata_purge2$TOT ~ mydata_purge2$LC80 +
mydata_purge2$LC231 + mydata_purge2$DEM, data = food)
> summary(zip1A)
Call:
zeroinfl(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
mydata_purge2$LC231 + mydata_purge2$DEM, data = food)
Pearson residuals:
Min 1Q Median 3Q Max
-2.2128 -1.2886 -0.5010 0.7594 11.8458
Count model coefficients (poisson with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.894e+00 3.547e-02 53.401 < 2e-16 ***
mydata_purge2$LC80 -2.249e+00 1.768e-01 -12.725 < 2e-16 ***
mydata_purge2$LC231 1.799e-01 4.492e-02 4.005 6.21e-05 ***
mydata_purge2$DEM 6.670e-04 7.687e-05 8.678 < 2e-16 ***
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.0593751 0.2308068 -0.257 0.796986
mydata_purge2$LC80 2.9428092 0.8523669 3.453 0.000555 ***
mydata_purge2$LC231 -1.7772101 0.3233166 -5.497 3.87e-08 ***
mydata_purge2$DEM -0.0031901 0.0005319 -5.997 2.01e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of iterations in BFGS optimization: 13
Log-likelihood: -1.727e+04 on 8 Df
> a1 = predict(zip1A)
> b1 = mydata_purge2$TOT
> plot(a1,b1)
"
Please find attached the plot of zip1A (which look quite similar to the
hurdle1).
Your help would be much appreciated,
Thanks,
JM.
More information about the R-help
mailing list