[R-sig-ME] Zero-inflated binomial (ZIB) models in glmmADMB: warnings and errors
Stefan Ferger
stefan.ferger at yahoo.de
Wed Oct 23 16:48:42 CEST 2013
Dear list members,
I have two problems with fitting a zero-inflated glmm with a binomial distribution (ZIB). None of both seems to be new, but the suggestions I found in the mailing list so far didn't help me to solve the problems.
I briefly try to describe my experimental design, so that you know what it is about:
I have 13 plots; each plot has 10 subplots (called clusters here); within each cluster are 20 fruits of 3 colors each. The whole experiment was replicated in 2 seasons. The response variable are fruits that are pecked vs. not pecked, i.e. the number of success vs. the number of failures per cluster (within plot) and color and season, i.e. a binomial response.
The success-part of this binomial response is strongly, but quite evenly zero-inflated, i.e. there is no obvious pattern in the distribution of zeros in successes between seasons or colors:
> table(fruits$pecks, fruits$color, fruits$season) , , = cold black red violet 0 86 90 95 1 23 18 17 2 10 11 6 3 7 3 3 4 3 3 3 5 0 2 4 6 1 3 1 7 0 0 0 9 0 0 0 11 0 0 1 , , = hot black red violet 0 100 99 91 1 11 13 18 2 12 6 8 3 5 3 6 4 2 4 3 5 0 1 1 6 0 1 0 7 0 1 3 9 0 1 0 11 0 1 0
Although not new (see for example Daniel Hall's paper: http://onlinelibrary.wiley.com/doi/10.1111/j.0006-341X.2000.01030.x/full), it seems that modeling a binomial response variable with a very low frequency of success in a ZIB is still quite unusual....as Ben Bolker wrote: [...] using a binomial family with zeroInflation=TRUE. If your response is a matrix of successes and failures, that's unusual but plausible [...] (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q1/017903.html)
I found five R packages that are apparently able to deal with zero-inflation: glmmADMB, MCMCglmm, pscl, VGAM, R-INLA.
As clusters are nested within plots and the experiment is replicated in 2 seasons, I would like to include a random effect of "cluster nested in plot". pscl and VGAM seem not to support random effects. glmmADMB, MCMCglmm and R-INLA apparently do. As I have been working with lme4 previously, I decided to try glmmADMB first, as the model specification is very similar in both packages.
Testing the effect of season and color on the number of pecked fruits SEPARATELY works well, when I use the follwing models (although the SEs are comparatively large [on the logit-scale]):
---------------------------------------------------------------------------------------------------------------------------
Call: glmmadmb(formula = fruits.bin ~ season + (1 | plotID/clusterID) + (1 | color), data = fruits.av, family = "binomial", zeroInflation = TRUE) AIC: 1543.8 Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.5364 0.3059 -11.6 <2e-16 *** seasonhot -0.0469 0.1184 -0.4 0.69
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Number of observations: total=780, plotID=13, plotID:clusterID=130, color=3
Random effect variance(s):
Group=plotID Variance StdDev
(Intercept) 0.7154 0.8458
Group=plotID:clusterID Variance StdDev
(Intercept) 0.9076 0.9527
Group=color Variance StdDev
(Intercept) 0.03397 0.1843 Zero-inflation: 0.40874 (std. err.: 0.04368 ) Log-likelihood: -765.91
[The function overdisp_fun from http://glmm.wikidot.com/faq gives a ratio of 1.40 (pearsonchisq/rdf)].
--------------------------------------------------------------------------------
Call: glmmadmb(formula = fruits.bin ~ color + (1 | plotID/clusterID), data = fruits.av, family = "binomial", zeroInflation = TRUE) AIC: 1538.4 Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.807 0.292 -13.02 <2e-16 *** colorred 0.396 0.137 2.88 0.0039 ** colorviolet 0.367 0.138 2.66 0.0079 **
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Number of observations: total=780, plotID=13, plotID:clusterID=130
Random effect variance(s):
Group=plotID Variance StdDev
(Intercept) 0.7011 0.8373
Group=plotID:clusterID Variance StdDev
(Intercept) 0.9091 0.9534 Zero-inflation: 0.41143 (std. err.: 0.043531 ) Log-likelihood: -763.194
[The function overdisp_fun from http://glmm.wikidot.com/faq gives a ratio of 1.42 (pearsonchisq/rdf)]
-------------------------------------------------------------------------------
-----------------------------
1st problem
-----------------------------
However, when I test both season and color TOGETHER plus their interaction, I get a warning (see the summary below for the model specification):
---------------------------------------------------------------------------------------------------------------------------------
GLMM's in R powered by AD Model Builder: Family: binom link = logit Zero inflation: p = 0.40831 Fixed effects: Log-likelihood: -778.26 AIC: 1574.52 Formula: fruits.bin ~ season * color + (1 | plotID/clusterID) (Intercept) seasonhotcolorredcolorvioletseasonhot:colorred -2.7569960 -0.9638953 -0.2266267 -0.2346871 0.8357870 seasonhot:colorviolet 0.8964410 Random effects:
Structure: Diagonal matrix
Group=plotID Variance StdDev
(Intercept) 2.242 1.497
Group=plotID:clusterID Variance StdDev
(Intercept) 0.9694 0.9846 Number of observations: total=780, plotID=13, plotID:clusterID=130 Warning message: In print.glmmadmb(list(n = 780L, q = c(13L, 130L), formula = fruits.bin ~ : Object has a large gradient component
Here is the summary:
-------------------------------------------------------------------------------------------------------------------------------
Call: glmmadmb(formula = fruits.bin ~ season * color + (1 | plotID/clusterID), data = fruits.av, family = "binomial", zeroInflation = TRUE) AIC: 1574.5 Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.757 0.620 -4.44 8.8e-06 *** seasonhot -0.964 0.204 -4.73 2.2e-06 *** colorred -0.227 0.174 -1.30 0.1931 colorviolet -0.235 0.181 -1.29 0.1957 seasonhot:colorred 0.836 0.283 2.96 0.0031 ** seasonhot:colorviolet 0.896 0.279 3.21 0.0013 **
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Number of observations: total=780, plotID=13, plotID:clusterID=130
Random effect variance(s):
Group=plotID Variance StdDev
(Intercept) 2.242 1.497
Group=plotID:clusterID Variance StdDev
(Intercept) 0.9694 0.9846 Zero-inflation: 0.40831 (std. err.: 0.044651 ) Log-likelihood: -778.26 Warning message: In glmmadmb(fruits.bin ~ season * color + (1 | plotID/clusterID), : Convergence failed:log-likelihood of gradient= -85.777
[The function overdisp_fun from http://glmm.wikidot.com/faq gives a ratio of 1.88 (pearsonchisq/rdf)]
----------------------------------------------------------------------------------------------------------------------
I tried to work on this, guided by the recommendations Ben Bolker gave here:
http://lists.admb-project.org/pipermail/users/2012-February/001695.html
>> setting verbose=T does not provide any additional information (to me)
>> setting the maximum number of iterations larger does not change anything (I set maxfn=10000)
>> I plotted the raw data, the model predictions of the zero-inflated model and the predictions of a "normal" mixed model (glmer, i.e. without controlling for zero-inflation); I attach them to this email (if this doesn't work with the mailing list, I can also send them to whomever likes to have a look). The predictions of the glmm look quite good. The predictions of the ziglmm lack variation within groups, i.e. there is only one value for all replicates of each combination of season and color; and they seem to be slightly to large - but both could probably be related to the warning message!?
Is it likely that the stronger overdispersion is responsible for this and would it make sense to control for it with an additional observation-level random effect? (I tried it > it didn't work...).
-----------------------------
2nd problem
-----------------------------
In another model, in which I try to investigate on the effect of elevation on the number of pecks on fruits, I get another error message:
> summary(pecks.zibin <- glmmadmb(fruits.bin ~ alt + (1|plotID/clusterID) + (1|color), zeroInflation=TRUE, family="binomial", data=fruits.av)) Parameters were estimated, but not standard errors were not: the most likely problem is that the curvature at MLE was zero or negative Warning message: running command 'C:\Windows\system32\cmd.exe /c "D:/Sonstiges/R_library/glmmADMB/bin/windows64/glmmadmb.exe" -maxfn 500 -maxph 5 -noinit -shess' had status 42 Error in summary(pecks.zibin <- glmmadmb(fruits.bin ~ alt + (1 | plotID/clusterID) + : FehlerbeiderAuswertungdesArgumentes 'object' beiderMethodenauswahlfürFunktion 'summary': Error in glmmadmb(fruits.bin ~ alt + (1 | plotID/clusterID) + (1 | color), : The function maximizer failed (couldn't find STD file) Troubleshooting steps include (1) run with 'save.dir' set and inspect output files; (2) change run parameters: see '?admbControl'
Again I tried to work on this, guided by the trouble shooting steps provided in ?admbControl
and by Ben Bolkers comments here:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q2/018545.html
>> setting noinit=F and/or shess=F does not change anything, nor do changes on the other run parameters
>> the output files do not provide any information to me, as I do not know how to open them and what to look for
1) None of the models shown above produces any warnings or errors when I run them with glmer (and thus ignore the zero-inflation).
2) When I sum all pecks on plot level and leave out cluster completely, I get much less zeros (but also much less replicates). When again modeled in glmer without controlling for zero-inflation the results are qualitatively the same as in 1), but the overdispersion is becoming much stronger (ratio ~ 3.5). Correcting for this by incorporating an observation-level random effect, however, leads to strong underdispersion (ratio ~ 0.3) and differing results.
3) I nevertheless think that the model structure should be determined by the design and therefore I would like to go for the version that appreciates the clusters, which means that I have to deal with these zeros.
Does anybody know what I could do to fit these models in glmmADMBsuccessfully? Do I actually need to account for this zero-inflation, the consensus seems to be that one should do so if there are "over-proportionally" many zeros - but when is this threshold passed?
As fitting the models successfully in glmmADMB might be tricky, does anybody know how the model structure should look like in one of the other packages, i.e. MCMCglmm (I found the model specification very tricky here) or R-INLA?
Thank you for your help,
Stefan
--
Stefan Ferger
Biodiversity and Climate Research Centre (BiK-F)
Project Area B: Biodiversity Dynamics and Climate
Senckenberganlage 25
D-60325 Frankfurt am Main
phone: +49 69 7542 1871
fax: +49 69 7542 1800
e-mail: stefan.ferger at senckenberg.de
homepage: www.bik-f.de/root/index.php?page_id=433
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ziglmm-predicted pecks per cluster_original.png
Type: image/png
Size: 3875 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20131023/87310332/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glmm-predicted pecks per cluster_zoomed in.png
Type: image/png
Size: 5040 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20131023/87310332/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pecks per cluster_zoomed in.png
Type: image/png
Size: 4581 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20131023/87310332/attachment-0005.png>
More information about the R-sig-mixed-models
mailing list