[R-sig-ME] different overdispersion parameter for binomial GLMM in lme4, glmmADMB and glmmTMB
Ben Bolker
bbolker at gmail.com
Fri Mar 16 19:19:09 CET 2018
Ariel, can you post this as an issue at
https://github.com/glmmTMB/glmmTMB/issues ? This stuff is really hard
to get right ...
Thomas: I think an important point is that it really only makes much
sense to focus on the dispersion in a model that does *NOT* already
have a dispersion parameter estimated. If you want to choose among
models with different forms of dispersion, I would probably suggest
AIC (which reduces to comparing log-likelihood for models with the
same number of parameters -- if the two models aren't nested, as they
probably aren't, you can't do a statistical test, but you can still
see which one fits the data better).
On Fri, Mar 16, 2018 at 12:31 PM, Muldoon, Ariel
<Ariel.Muldoon at oregonstate.edu> wrote:
> Hi, Thomas-
>
> After doing a little poking around, my impression is that the glmmTMB model may not be returning the correct values for the residuals for a binomial model.
>
> I fit the model from the "Examples" in both glmmTMB (development version) and lme4, using the two different coding approaches that are often used for fitting binomial models (one with a matrix response and one with a proportion plus number of trials as weights).
>
> The two lme4 models return identical results, so there wasn't much reason to pursue them both. The second model (tmbm2) is like the model you are fitting.
>
> library(glmmTMB)
> library(lme4)
>
> tmbm1 = glmmTMB(cbind(incidence, size - incidence) ~ period + (1 | herd),
> data = cbpp, family = binomial)
>
> tmbm2 = glmmTMB(incidence/size ~ period + (1 | herd),
> data = cbpp, weights = size, family = binomial)
>
> tmbm3 = glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
> data = cbpp, family = binomial)
>
> tmbm4 = glmer(incidence/size ~ period + (1 | herd),
> data = cbpp, family = binomial, weights = size)
>
> Drilling down for reasons for residuals to be different, I found that the second model returns different fitted values than the others.
>
> fitted(tmbm1)
> fitted(tmbm2)
> fitted(tmbm4)
>
> I can't tell you why this is and I can't be certain the values from the other models are correct, but if I manually calculate the fitted values for the second model the results match the values from the other three models. (You'll see slight differences between the lme4 and glmmTMB models due to slight differences in estimates of the conditional modes of the random effects.)
>
> # Manual calculation of fitted values
> plogis( (getME(tmbm2, "X") %*% fixef(tmbm2)$cond) +
> rep(unlist(ranef(tmbm2)$cond),
> times = c(4L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)) )
>
> A good approach then might be to use the matrix-style response variable in glmmTMB. However, the residuals.glmmTMB function then uses the matrix response when calculating residuals instead of a proportion response.
>
> # Calculate response residuals for matrix response
> residuals(tmbm1)
> model.response(tmbm1$frame) - fitted(tmbm1)
> # Here is what these should be, which matches lme4
> model.response(tmbm1$frame)[,1]/cbpp$size - fitted(tmbm1)
> getME(tmbm4, "y") - fitted(tmbm4)
>
> The last difference I found is in the calculation of the pearson residuals. I believe glmmTMB uses mu*(1-mu) to calculate the variance that is used in the divisor (I base this on family(tmbm1)$variance). However, I think lme4 uses the variance of a proportion, mu*(1-mu)/m (where m is the binomial sample size/number of trials).
>
> residuals(tmbm4, type = "pearson")
> ( getME(tmbm4, "y") - fitted(tmbm4) )/sqrt( fitted(tmbm3)*(1-fitted(tmbm3))/cbpp$size )
>
> I'm not sure any of this helps you decide how to move forward, but I definitely wouldn't dismiss the overdispersion you are seeing in your lme4 model yet.
>
> Ariel
>
>
> -----Original Message-----
> From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Thomas Merkling
> Sent: Thursday, March 15, 2018 6:51 PM
> To: r-sig-mixed-models at r-project.org
> Subject: [R-sig-ME] different overdispersion parameter for binomial GLMM in lme4, glmmADMB and glmmTMB
>
> Hi all,
>
> I'm trying to model proportion data using a binomial GLMM. I first started with lme4 and saw that my data were overdispersed.
>
> LME4Inter <- glmer(PropInter ~ Manip + Observer +(1|Date), family=binomial, weights = NbAg, data = senior, control=glmerControl(optimizer="bobyqa",optCtrl = list(maxfun=100000)))
>
> Using the dispersion_glmer (provided by Ben Bolker, see below for code), I got a dispersion parameter of 1.99 (as expected the outcome is the same with glmmADMB).
>
> I then tried to add an observation level random effect, and the dispersion parameter dropped to 0.24. With such a small dispersion parameter, my model becomes overly conservative and I'll probably not be able to detect any (true) effect (if I understand correctly). Would you stick to this model anyways?
>
> I started investigating beta-binomial models to see if they could better model overdispersion (and not be too conservative) and wanted to use glmmTMB to do that.
>
> I first tried to refit the binomial GLMM with glmmTMB and calculate the dispersion parameter using the dispfun function (also provided by Ben Bolker, see below for code), and got a very different response: the dispersion parameter was 0.514. If this is correct, I wouldn't have to worry about overdispersion.
>
> TMBInter <- glmmTMB(PropInter ~ Manip + Observer +(1|Date), family=binomial, weights = NbAg, data = senior)
>
> The estimates and SEs were identical between the 3 packages, so I am fitting correctly the exact same models. However, the residuals between glmer/glmmADMB and glmmTMB models were quite different. Is that expected?
> I tried to run Poisson GLMMs with the 2 packages and got identical dispersion parameters, so the dispfun function works well for glmmTMB too.
>
> Which dispersion parameter value should I trust for the binomial GLMM?
> Is there something specific about binomial GLMMs in glmmTMB?
>
> Thanks in advance for your help!
>
> Thomas Merkling
>
>
> dispersion_glmer <- function(model) {
> ## number of variance parameters in
> ## an n-by-n variance-covariance matrix
> vpars <- function(m) {
> nrow(m)*(nrow(m)+1)/2
> }
> model.df <- sum(sapply(VarCorr(model),vpars))+length(fixef(model))
> rdf <- nrow(model.frame(model))-model.df
> rp <- residuals(model,type="pearson")
> Pearson.chisq <- sum(rp^2)
> prat <- Pearson.chisq/rdf
> pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
> c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
> }
>
> dispfun <- function(m) {
> r <- residuals(m,type="pearson")
> n <- df.residual(m)
> dsq <- sum(r^2)
> c(dsq=dsq,n=n,disp=dsq/n)
> }
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list