[R-sig-ME] mixed zero-inflated Poisson regression model

Thu Oct 9 12:17:10 CEST 2014

Hi Dieter,

1. Since you have a design with repeated measures you should include the tree effect. Hence the first two models are not consistent with your design. Leaving only one model, so no need to model comparison ;-)
2. Treatments with more trees will have more observations, giving more precise estimates. You don't need to correct for the number of trees per observation.
3. Have a look at the multcomp package. I have used it to do multiple comparisons with several models (lm, glm, lmer, glmer). I haven't tried it with glmmadmb.

PS: I would recode the treatment factor so that the control is used as  reference.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx op inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces op r-project.org [mailto:r-sig-mixed-models-bounces op r-project.org] Namens Dieter Anseeuw
Verzonden: woensdag 8 oktober 2014 16:38
Aan: r-sig-mixed-models op r-project.org
Onderwerp: [R-sig-ME] mixed zero-inflated Poisson regression model

Dear all,
I have only limited experience in processing count data and would like to improve my statistical skills by addressing to the r-sig-mixed mailing list to handle the challenge (to me it is) below.

We did an experiment in which we would like to look at the effect of six treatments (+ 1 control) on the presence of leaf miners in Hippocastanum tree leaves.  Unfortunately the data are unbalanced: for two treatments we looked at 10 trees; for the other 4 + the control we looked only at 5 trees. For each tree 100 leaves were collected for which the number of mines (created by the life miners) were counted. In total we had 47 trees, 100 leaves per tree, this brings up 4700 observations.

To analyse this dataset, I need to use a zero inflated Poisson regression model, but I also may need to include the factor 'tree' as a random effect.

According to what I found in the r-sig-mixed list for comparable problems, this is where I got with my current code:
(whereby tellingen=count observations on a leaf (4700 observations); boom=unique tree code (47 trees); behandeling=treatment (7 levels))

Poisson Regression:
summary(mod1<-glm(tellingen~behandeling, family="poisson", data=finalcounts))

Zero Inflated Poisson Regression:
summary(admod1<-glmmadmb(tellingen~behandeling, data=finalcounts, family="poisson", zeroInflation=TRUE))

Zero Inflated Poisson Regression, with tree as random effect:
summary(admixmod1<-glmmadmb(tellingen~behandeling, data=finalcounts, family="poisson", zeroInflation=TRUE, random=~(1|boom))) The latter takes quite some computational time.

Any comments on my approach that may improve my insight are warmly welcomed.
Questions:

1.       Which test should I use to compare these three models (to decide whether or not I should use the simple or a more complex model)

2.       So far, I did not specifically take into account the difference in observed trees per treatment (10 versus 5 trees). Problematic? How can I address this?

3.       How should I perform pairwise comparisons among treatments?

Just for your information, the summaries of the above mentioned models:

Call: glm(formula = tellingen ~ behandeling, family = "poisson", data = finalcounts)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-4.6851  -2.0088  -0.9440   0.5938  18.5705

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                  2.33537    0.01270 183.889  < 2e-16 ***
behandeling2lijmbandenonder -0.18921    0.01988  -9.518  < 2e-16 ***
behandeling4lijmbandenboven  0.06025    0.01770   3.404 0.000663 ***
behandelingcontrole         -0.89223    0.02517 -35.445  < 2e-16 ***
behandelinglijm_feromoon    -0.82509    0.02456 -33.601  < 2e-16 ***
behandelingtextiel200       -0.96470    0.02038 -47.343  < 2e-16 ***
behandelingtextiel80        -0.96293    0.02037 -47.282  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 35144  on 4699  degrees of freedom Residual deviance: 29089  on 4693  degrees of freedom
AIC: 42931

Number of Fisher Scoring iterations: 5

XXXXXXXXXXXXXXX

Call: glmmadmb(formula = tellingen ~ behandeling, data = finalcounts,
    family = "poisson", zeroInflation = TRUE)

AIC: 35565.4

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                  2.06844    0.01724  119.97   <2e-16 ***
behandeling2lijmbandenonder  0.00801    0.02503    0.32     0.75
behandeling4lijmbandenboven  0.17392    0.02309    7.53    5e-14 ***
behandelingcontrole         -0.48004    0.03054  -15.72   <2e-16 ***
behandelinglijm_feromoon    -0.43062    0.02998  -14.36   <2e-16 ***
behandelingtextiel200       -0.63398    0.02649  -23.93   <2e-16 ***
behandelingtextiel80        -0.60413    0.02519  -23.98   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Number of observations: total=4700
Zero-inflation: 0.12999  (std. err.:  0.0051909 )

Log-likelihood: -17774.7

XXXXXXXXXXXXXXXXXXXX

Call: glmmadmb(formula = tellingen ~ behandeling, data = finalcounts,
    family = "poisson", random = ~(1 | boom), zeroInflation = TRUE)

AIC: 32854.2

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                   2.0883     0.1844   11.32  < 2e-16 ***
behandeling2lijmbandenonder  -0.0238     0.2735   -0.09  0.93069
behandeling4lijmbandenboven   0.1715     0.2607    0.66  0.51059
behandelingcontrole          -0.8631     0.2749   -3.14  0.00169 **
behandelinglijm_feromoon     -0.5662     0.2743   -2.06  0.03901 *
behandelingtextiel200        -0.7800     0.2338   -3.34  0.00085 ***
behandelingtextiel80         -0.7402     0.2339   -3.16  0.00155 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Number of observations: total=4700, boom=47 Random effect variance(s):
Group=boom
            Variance StdDev
(Intercept)   0.2037 0.4514

Zero-inflation: 0.092011  (std. err.:  0.0051551 )

Log-likelihood: -16418.1

Dieter Anseeuw, PhD
Aquaculture Research Scientist

Inagro vzw
Ieperseweg 87
B-8800 Rumbeke
Belgium
Direct phone: +32(0)51 273 373
http://www.linkedin.com/in/dieteranseeuw

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models op r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.