[R-sig-ME] different overdispersion parameter for binomial GLMM in lme4, glmmADMB and glmmTMB

Fri Mar 16 21:39:00 CET 2018

Thanks Ariel and Ben for your answers!

Following Ariel's troubleshooting, I managed to modify the function I 
was using to match the results from lme4:

dispfun <- function(m, trials) {
# where m is the glmmTMB model and trials a vector of number of trials 
for each observation
   r <- (model.response(m$frame)[,1]/trials - 
fitted(m))/sqrt(fitted(m)*(1-fitted(m))/trials)
   n <- df.residual(m)
   dsq <- sum(r^2)
   c(dsq=dsq,n=n,disp=dsq/n)
}

Ben, sorry but I am bit confused by your answer. If I understand 
correctly, the approach you would recommend is to calculate the 
dispersion parameter on the binomial model and if there is 
overdispersion compare models with different ways to deal with it (e.g 
observation-level random effects and beta-binomial) to the binomial one 
to find out which ones fits the data better. Is that correct? And so 
there would be no point in calculating the dispersion parameter for the 
OLRE and beta-binomial model and see how much it goes down?

Cheers,
Thomas

On 16/03/2018 14:19, Ben Bolker wrote:
> Ariel, can you post this as an issue at
> https://github.com/glmmTMB/glmmTMB/issues ?  This stuff is really hard
> to get right ...
>
>    Thomas: I think an important point is that it really only makes much
> sense to focus on the dispersion in a model that does *NOT* already
> have a dispersion parameter estimated.  If you want to choose among
> models with different forms of dispersion, I would probably suggest
> AIC (which reduces to comparing log-likelihood for models with the
> same number of parameters -- if the two models aren't nested, as they
> probably aren't, you can't do a statistical test, but you can still
> see which one fits the data better).
>
>
>
> On Fri, Mar 16, 2018 at 12:31 PM, Muldoon, Ariel
> <Ariel.Muldoon at oregonstate.edu> wrote:
>> Hi, Thomas-
>>
>> After doing a little poking around, my impression is that the glmmTMB model may not be returning the correct values for the residuals for a binomial model.
>>
>> I fit the model from the "Examples" in both glmmTMB (development version) and lme4, using the two different coding approaches that are often used for fitting binomial models (one with a matrix response and one with a proportion plus number of trials as weights).
>>
>> The two lme4 models return identical results, so there wasn't much reason to pursue them both.  The second model (tmbm2) is like the model you are fitting.
>>
>> library(glmmTMB)
>> library(lme4)
>>
>> tmbm1 = glmmTMB(cbind(incidence, size - incidence) ~ period + (1 | herd),
>>                    data = cbpp, family = binomial)
>>
>> tmbm2 = glmmTMB(incidence/size ~ period + (1 | herd),
>>                    data = cbpp, weights = size, family = binomial)
>>
>> tmbm3 = glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
>>                    data = cbpp, family = binomial)
>>
>> tmbm4 = glmer(incidence/size ~ period + (1 | herd),
>>                  data = cbpp, family = binomial, weights = size)
>>
>> Drilling down for reasons for residuals to be different, I found that the second model returns different fitted values than the others.
>>
>> fitted(tmbm1)
>> fitted(tmbm2)
>> fitted(tmbm4)
>>
>> I can't tell you why this is and I can't be certain the values from the other models are correct, but if I manually calculate the fitted values for the second model the results match the values from the other three models.  (You'll see slight differences between the lme4 and glmmTMB models due to slight differences in estimates of the conditional modes of the random effects.)
>>
>> # Manual calculation of fitted values
>> plogis( (getME(tmbm2, "X") %*% fixef(tmbm2)$cond) +
>>               rep(unlist(ranef(tmbm2)$cond),
>>                   times = c(4L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)) )
>>
>> A good approach then might be to use the matrix-style response variable in glmmTMB.  However, the residuals.glmmTMB function then uses the matrix response when calculating residuals instead of a proportion response.
>>
>> # Calculate response residuals for matrix response
>> residuals(tmbm1)
>> model.response(tmbm1$frame) - fitted(tmbm1)
>> # Here is what these should be, which matches lme4
>> model.response(tmbm1$frame)[,1]/cbpp$size - fitted(tmbm1)
>> getME(tmbm4, "y") - fitted(tmbm4)
>>
>> The last difference I found is in the calculation of the pearson residuals.  I believe glmmTMB uses mu*(1-mu) to calculate the variance that is used in the divisor (I base this on family(tmbm1)$variance).  However, I think lme4 uses the variance of a proportion, mu*(1-mu)/m (where m is the binomial sample size/number of trials).
>>
>> residuals(tmbm4, type = "pearson")
>> ( getME(tmbm4, "y") - fitted(tmbm4) )/sqrt( fitted(tmbm3)*(1-fitted(tmbm3))/cbpp$size )
>>
>> I'm not sure any of this helps you decide how to move forward, but I definitely wouldn't dismiss the overdispersion you are seeing in your lme4 model yet.
>>
>> Ariel
>>
>>
>> -----Original Message-----
>> From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Thomas Merkling
>> Sent: Thursday, March 15, 2018 6:51 PM
>> To: r-sig-mixed-models at r-project.org
>> Subject: [R-sig-ME] different overdispersion parameter for binomial GLMM in lme4, glmmADMB and glmmTMB
>>
>> Hi all,
>>
>> I'm trying to model proportion data using a binomial GLMM. I first started with lme4 and saw that my data were overdispersed.
>>
>> LME4Inter <- glmer(PropInter ~ Manip + Observer  +(1|Date), family=binomial, weights = NbAg, data = senior, control=glmerControl(optimizer="bobyqa",optCtrl = list(maxfun=100000)))
>>
>> Using the dispersion_glmer (provided by Ben Bolker, see below for code), I got a dispersion parameter of 1.99 (as expected the outcome is the same with glmmADMB).
>>
>> I then tried to add an observation level random effect, and the dispersion parameter dropped to 0.24. With such a small dispersion parameter, my model becomes overly conservative and I'll probably not be able to detect any (true) effect (if I understand correctly). Would you stick to this model anyways?
>>
>> I started investigating beta-binomial models to see if they could better model overdispersion (and not be too conservative) and wanted to use glmmTMB to do that.
>>
>> I first tried to refit the binomial GLMM with glmmTMB and calculate the dispersion parameter using the dispfun function (also provided by Ben Bolker, see below for code), and got a very different response: the dispersion parameter was 0.514. If this is correct, I wouldn't have to worry about overdispersion.
>>
>> TMBInter <- glmmTMB(PropInter ~ Manip + Observer  +(1|Date), family=binomial, weights = NbAg, data = senior)
>>
>> The estimates and SEs were identical between the 3 packages, so I am fitting correctly the exact same models. However, the residuals between glmer/glmmADMB and glmmTMB models were quite different. Is that expected?
>> I tried to run Poisson GLMMs with the 2 packages and got identical dispersion parameters, so the dispfun function works well for glmmTMB too.
>>
>> Which dispersion parameter value should I trust for the binomial GLMM?
>> Is there something specific about binomial GLMMs in glmmTMB?
>>
>> Thanks in advance for your help!
>>
>> Thomas Merkling
>>
>>
>> dispersion_glmer <- function(model) {
>>     ## number of variance parameters in
>>     ##   an n-by-n variance-covariance matrix
>>     vpars <- function(m) {
>>       nrow(m)*(nrow(m)+1)/2
>>     }
>>     model.df <- sum(sapply(VarCorr(model),vpars))+length(fixef(model))
>>     rdf <- nrow(model.frame(model))-model.df
>>     rp <- residuals(model,type="pearson")
>>     Pearson.chisq <- sum(rp^2)
>>     prat <- Pearson.chisq/rdf
>>     pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
>>     c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
>> }
>>
>> dispfun <- function(m) {
>>     r <- residuals(m,type="pearson")
>>     n <- df.residual(m)
>>     dsq <- sum(r^2)
>>     c(dsq=dsq,n=n,disp=dsq/n)
>> }
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models