[R-sig-ME] mixed zero-inflated Poisson regression model
Dieter Anseeuw
dieter.anseeuw at inagro.be
Wed Oct 8 16:37:49 CEST 2014
Dear all,
I have only limited experience in processing count data and would like to improve my statistical skills by addressing to the r-sig-mixed mailing list to handle the challenge (to me it is) below.
We did an experiment in which we would like to look at the effect of six treatments (+ 1 control) on the presence of leaf miners in Hippocastanum tree leaves. Unfortunately the data are unbalanced: for two treatments we looked at 10 trees; for the other 4 + the control we looked only at 5 trees. For each tree 100 leaves were collected for which the number of mines (created by the life miners) were counted. In total we had 47 trees, 100 leaves per tree, this brings up 4700 observations.
To analyse this dataset, I need to use a zero inflated Poisson regression model, but I also may need to include the factor 'tree' as a random effect.
According to what I found in the r-sig-mixed list for comparable problems, this is where I got with my current code:
(whereby tellingen=count observations on a leaf (4700 observations); boom=unique tree code (47 trees); behandeling=treatment (7 levels))
Poisson Regression:
summary(mod1<-glm(tellingen~behandeling, family="poisson", data=finalcounts))
Zero Inflated Poisson Regression:
summary(admod1<-glmmadmb(tellingen~behandeling, data=finalcounts, family="poisson", zeroInflation=TRUE))
Zero Inflated Poisson Regression, with tree as random effect:
summary(admixmod1<-glmmadmb(tellingen~behandeling, data=finalcounts, family="poisson", zeroInflation=TRUE, random=~(1|boom)))
The latter takes quite some computational time.
Any comments on my approach that may improve my insight are warmly welcomed.
Questions:
1. Which test should I use to compare these three models (to decide whether or not I should use the simple or a more complex model)
2. So far, I did not specifically take into account the difference in observed trees per treatment (10 versus 5 trees). Problematic? How can I address this?
3. How should I perform pairwise comparisons among treatments?
Just for your information, the summaries of the above mentioned models:
Call: glm(formula = tellingen ~ behandeling, family = "poisson", data = finalcounts)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.6851 -2.0088 -0.9440 0.5938 18.5705
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.33537 0.01270 183.889 < 2e-16 ***
behandeling2lijmbandenonder -0.18921 0.01988 -9.518 < 2e-16 ***
behandeling4lijmbandenboven 0.06025 0.01770 3.404 0.000663 ***
behandelingcontrole -0.89223 0.02517 -35.445 < 2e-16 ***
behandelinglijm_feromoon -0.82509 0.02456 -33.601 < 2e-16 ***
behandelingtextiel200 -0.96470 0.02038 -47.343 < 2e-16 ***
behandelingtextiel80 -0.96293 0.02037 -47.282 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 35144 on 4699 degrees of freedom
Residual deviance: 29089 on 4693 degrees of freedom
AIC: 42931
Number of Fisher Scoring iterations: 5
XXXXXXXXXXXXXXX
Call: glmmadmb(formula = tellingen ~ behandeling, data = finalcounts,
family = "poisson", zeroInflation = TRUE)
AIC: 35565.4
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.06844 0.01724 119.97 <2e-16 ***
behandeling2lijmbandenonder 0.00801 0.02503 0.32 0.75
behandeling4lijmbandenboven 0.17392 0.02309 7.53 5e-14 ***
behandelingcontrole -0.48004 0.03054 -15.72 <2e-16 ***
behandelinglijm_feromoon -0.43062 0.02998 -14.36 <2e-16 ***
behandelingtextiel200 -0.63398 0.02649 -23.93 <2e-16 ***
behandelingtextiel80 -0.60413 0.02519 -23.98 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of observations: total=4700
Zero-inflation: 0.12999 (std. err.: 0.0051909 )
Log-likelihood: -17774.7
XXXXXXXXXXXXXXXXXXXX
Call: glmmadmb(formula = tellingen ~ behandeling, data = finalcounts,
family = "poisson", random = ~(1 | boom), zeroInflation = TRUE)
AIC: 32854.2
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.0883 0.1844 11.32 < 2e-16 ***
behandeling2lijmbandenonder -0.0238 0.2735 -0.09 0.93069
behandeling4lijmbandenboven 0.1715 0.2607 0.66 0.51059
behandelingcontrole -0.8631 0.2749 -3.14 0.00169 **
behandelinglijm_feromoon -0.5662 0.2743 -2.06 0.03901 *
behandelingtextiel200 -0.7800 0.2338 -3.34 0.00085 ***
behandelingtextiel80 -0.7402 0.2339 -3.16 0.00155 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of observations: total=4700, boom=47
Random effect variance(s):
Group=boom
Variance StdDev
(Intercept) 0.2037 0.4514
Zero-inflation: 0.092011 (std. err.: 0.0051551 )
Log-likelihood: -16418.1
Dieter Anseeuw, PhD
Aquaculture Research Scientist
Inagro vzw
Ieperseweg 87
B-8800 Rumbeke
Belgium
Direct phone: +32(0)51 273 373
http://www.linkedin.com/in/dieteranseeuw
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list