[R-sig-ME] mixed zero-inflated Poisson regression model

Wed Oct 8 16:37:49 CEST 2014

Dear all,
I have only limited experience in processing count data and would like to improve my statistical skills by addressing to the r-sig-mixed mailing list to handle the challenge (to me it is) below.

We did an experiment in which we would like to look at the effect of six treatments (+ 1 control) on the presence of leaf miners in Hippocastanum tree leaves.  Unfortunately the data are unbalanced: for two treatments we looked at 10 trees; for the other 4 + the control we looked only at 5 trees. For each tree 100 leaves were collected for which the number of mines (created by the life miners) were counted. In total we had 47 trees, 100 leaves per tree, this brings up 4700 observations.

To analyse this dataset, I need to use a zero inflated Poisson regression model, but I also may need to include the factor 'tree' as a random effect.

According to what I found in the r-sig-mixed list for comparable problems, this is where I got with my current code:
(whereby tellingen=count observations on a leaf (4700 observations); boom=unique tree code (47 trees); behandeling=treatment (7 levels))

Poisson Regression:
summary(mod1<-glm(tellingen~behandeling, family="poisson", data=finalcounts))

Zero Inflated Poisson Regression:
summary(admod1<-glmmadmb(tellingen~behandeling, data=finalcounts, family="poisson", zeroInflation=TRUE))

Zero Inflated Poisson Regression, with tree as random effect:
summary(admixmod1<-glmmadmb(tellingen~behandeling, data=finalcounts, family="poisson", zeroInflation=TRUE, random=~(1|boom)))
The latter takes quite some computational time.

Any comments on my approach that may improve my insight are warmly welcomed.
Questions:

1.       Which test should I use to compare these three models (to decide whether or not I should use the simple or a more complex model)

2.       So far, I did not specifically take into account the difference in observed trees per treatment (10 versus 5 trees). Problematic? How can I address this?

3.       How should I perform pairwise comparisons among treatments?

Just for your information, the summaries of the above mentioned models:

Call: glm(formula = tellingen ~ behandeling, family = "poisson", data = finalcounts)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-4.6851  -2.0088  -0.9440   0.5938  18.5705

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                  2.33537    0.01270 183.889  < 2e-16 ***
behandeling2lijmbandenonder -0.18921    0.01988  -9.518  < 2e-16 ***
behandeling4lijmbandenboven  0.06025    0.01770   3.404 0.000663 ***
behandelingcontrole         -0.89223    0.02517 -35.445  < 2e-16 ***
behandelinglijm_feromoon    -0.82509    0.02456 -33.601  < 2e-16 ***
behandelingtextiel200       -0.96470    0.02038 -47.343  < 2e-16 ***
behandelingtextiel80        -0.96293    0.02037 -47.282  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 35144  on 4699  degrees of freedom
Residual deviance: 29089  on 4693  degrees of freedom
AIC: 42931

Number of Fisher Scoring iterations: 5

XXXXXXXXXXXXXXX

Call: glmmadmb(formula = tellingen ~ behandeling, data = finalcounts,
    family = "poisson", zeroInflation = TRUE)

AIC: 35565.4

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                  2.06844    0.01724  119.97   <2e-16 ***
behandeling2lijmbandenonder  0.00801    0.02503    0.32     0.75
behandeling4lijmbandenboven  0.17392    0.02309    7.53    5e-14 ***
behandelingcontrole         -0.48004    0.03054  -15.72   <2e-16 ***
behandelinglijm_feromoon    -0.43062    0.02998  -14.36   <2e-16 ***
behandelingtextiel200       -0.63398    0.02649  -23.93   <2e-16 ***
behandelingtextiel80        -0.60413    0.02519  -23.98   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Number of observations: total=4700
Zero-inflation: 0.12999  (std. err.:  0.0051909 )

Log-likelihood: -17774.7

XXXXXXXXXXXXXXXXXXXX

Call: glmmadmb(formula = tellingen ~ behandeling, data = finalcounts,
    family = "poisson", random = ~(1 | boom), zeroInflation = TRUE)

AIC: 32854.2

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                   2.0883     0.1844   11.32  < 2e-16 ***
behandeling2lijmbandenonder  -0.0238     0.2735   -0.09  0.93069
behandeling4lijmbandenboven   0.1715     0.2607    0.66  0.51059
behandelingcontrole          -0.8631     0.2749   -3.14  0.00169 **
behandelinglijm_feromoon     -0.5662     0.2743   -2.06  0.03901 *
behandelingtextiel200        -0.7800     0.2338   -3.34  0.00085 ***
behandelingtextiel80         -0.7402     0.2339   -3.16  0.00155 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Number of observations: total=4700, boom=47
Random effect variance(s):
Group=boom
            Variance StdDev
(Intercept)   0.2037 0.4514

Zero-inflation: 0.092011  (std. err.:  0.0051551 )

Log-likelihood: -16418.1

Dieter Anseeuw, PhD
Aquaculture Research Scientist

Inagro vzw
Ieperseweg 87
B-8800 Rumbeke
Belgium
Direct phone: +32(0)51 273 373
http://www.linkedin.com/in/dieteranseeuw

	[[alternative HTML version deleted]]