[R-sig-ME] Help - I have an underdispersed glmm :(

Juan F Dueñas jduenas at zedat.fu-berlin.de
Mon Jan 22 09:56:22 CET 2018

Hello Ben and everybody,

Thanks for your answer. I will try to answer to your questions in order.

Maybe I expressed myself wrong. I have no good reason to expect q1 to
follow poisson. All I understand - in my comic book type understanding-
is that shannon's H is a measure of diversity that results from the
weighting of the number of species in a sample by the individual
abundance within each species. Therefore q1 is a proportion and in that
case is not formally a count, so poisson will not work.

As to what sort of variance structure to expect from this response
variable, what I can say is that when I ignore the random effects and
fit a glm with gaussian (link="log"), there is almost no
heteroscedasticity, which is confirmed by a Bartlett's test
(below). However, the Q-Q plot of this same model shows that the std.dev
residuals do not adjust well to normality. I then re-specify the same
glm with a gamma distribution and log link function and that seems to
resolve both the heteroscedasticity and the fit of the std.dev

Bartlett test of homogeneity of variances
data: resid(glm) by meta.ec.n$Treatment
Bartlett's K-squared = 0.94193, df = 3, p-value = 0.8153

Finally, I am not aware of any theoretical framework for the
distribution of this variable. And the reason for not simply
transforming my response variable to linearize the relationship and then
fit my data with lmer is that I was under the impression that GLMM are a
more efficient way to deal with these issues. The other reason is that I
am worried that the transformation over corrects the residual fit to
the theoretical distribution.

In addition to the statements of the glm, I ran a boxcox command (from
package:MASS) on the glm. It seemed to confirm that log transformation
of the response variable would be the best shot. I then specify a linear
mixed effect model, with the log transformed variable. However the
diagnostic plots, showed some heteroscedasticity and not that optimal fit to normality.

lme.1 <- lmer(log.q1~Treatment +(1|Elevation) + (1|Elevation:Plot), data=dat)

Do you think it would be better then to go for the linear model transforming the data, instead of the glmm?

Best regards,


>    Can you say a little more about why you expect q1 to be
> Poisson-distributed, or more generally what mean-variance relationship
> you expect?  Is there a mechanistic/theoretical framework for the
> distribution of this variable?  Some reason not to find a transform that
> makes the responses reasonably homoscedastic and linear?
>    In general, it *only* makes sense to compute/test dispersion for model
> families where the variance has a fixed relationship with the mean
> (binomial, Poisson, ...), not when there is an estimated scale parameter
> (Gaussian, Gamma, ...)
>    Ben Bolker
> On 18-01-19 08:29 AM, Juan Dueñas wrote:
> >/Dear all, />//>//>/I wish to describe the relationship between the diversity of soil fungi />/and the application of different nutrients (fertilization). My response />/variable is the exponentiated Shannon index of diversity (q1). The />/explanatory variable has four levels. Each of the treatment factors was />/applied at the plot level and there are four replicates of each factor />/per elevation. Six randomly distributed soil cores were taken within />/each of the plots. />//>/For the GLMMs I used lme4 package version 1.1-15, and vegan 2.4-4 to />/estimate q1. />//>/One of the problems I have is that q1 takes decimal values, therefore it />/would be inappropriate (or impossible?) to fit my response variable with />/a poisson probability distribution. Therefore I tried gamma for the />/model specification with a log link function. I performed model />/selection with pairwise likelihood ratio tests. />//>/I then checked my favored model for over-dispersion (which is depicted />/in the output below). It seems, that the model is under dispersed! I was />/checking the literature for solutions to this issue, but I could only />/find some vague notions, namely that some level of underdispersion is />/tolerated. In the case of overdispersion, it is recommended to use />/quasilikelihood, but apparently this solution has been disabled a while />/ago in lme4. />//>/Generalized linear mixed model fit by maximum likelihood (Laplace />/Approximation) ['glmerMod'] />/Family: Gamma ( log ) />/Formula: q1 ~ Treatment + (1 | Elevation) + (1 | Elevation:Plot) />/Data: dat />/Control: glmerControl(optimizer = "nlminbw") />//>/AIC BIC logLik deviance df.resid />/1523.6 1547.9 -754.8 1509.6 231 />//>/Scaled residuals: />/Min 1Q Median 3Q Max />/-2.0938 -0.6378 -0.0694 0.5815 3.1634 />//>/Random effects: />/Groups Name Variance Std.Dev. />/Elevation:Plot (Intercept) 0.02632 0.1622 />/Elevation (Intercept) 0.01366 0.1169 />/Residual 0.17924 0.4234 />/Number of obs: 238, groups: Elevation:Plot, 47; Elevation, 3 />//>/Fixed effects: />/Estimate Std. Error t value Pr(>|z|) />/(Intercept) 2.63742 0.16504 15.981 <2e-16 *** />/TreatmentN -0.08395 0.13284 -0.632 0.527 />/TreatmentNP -0.15163 0.12964 -1.170 0.242 />/TreatmentP -0.12925 0.12998 -0.994 0.320 />/--- />/Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 />//>/Correlation of Fixed Effects: />/(Intr) TrtmnN TrtmNP />/TreatmentN -0.412 />/TreatmentNP -0.418 0.522 />/TreatmentP -0.417 0.524 0.535 />//>//>/chisq ratio rdf p />/38.4696552 0.1658175 232.0000000 1.0000000 />//>//>/My concrete questions are: Should I be concerned that my model is />/underdispersed? Will the coeficients of the fixed terms be reliable in />/this scenario? />//>//>/I appreciate any help on this regard. />//>//>/Best regards, />//>/Juan F. Dueñas />//>/_______________________________________________ />/R-sig-mixed-models at r-project.org 
> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> mailing list />/https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models /

More information about the R-sig-mixed-models mailing list