[R-sig-ME] Improving residual distribution in glmmTMB

Mon May 15 16:37:07 CEST 2023

   I don't have any perfect solution, and I don't necessarily recommend 
the brute-force approach tried below, but a truncated negative binomial 
(2/quadratic parameterization) seems best. The genpois and nbinom1 are 
actually wonkiest, in terms of appearance of residuals.
   The COM-Poisson below is *very* slow, you might want to skip it (it 
took 20 minutes to fit on 16 cores, and the results are not that great 
anyway).
   I thought about implementing a Poisson-inverse-Gamma to allow for 
heavier tails, but haven't gotten around to it yet (it's not trivial ...)
   There seems to be something wonky about the reported sigma/Pearson 
residuals from truncated nbinom2, but the results are OK ...

### Reproducible data and code:
d <- read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
library(glmmTMB)
library(purrr)
library(broom.mixed)
library(dplyr)
library(DHARMa)

model <- glmmTMB(downs ~ I(ELs/20) +
                     ## Document +
                     (1|District) +
                     (1|Document),
                 family=genpois(),
                      #  ziformula = ~ Document,
                   # dispformula = ~ Document,
                 data = d)

## very slow ...
system.time(model2 <- update(model, family = "compois",
                              control = glmmTMBControl(parallel = 16)))

##      user    system   elapsed
## 17194.076    11.103  1134.367

## truncated genpois doesn't work for some reason ...
fam <- c("genpois", "poisson", "nbinom1", "nbinom2")
fam <- c(fam, paste0("truncated_", fam[-1]))
mod_list <- lapply(fam,
                    function(f) {
                        cat(f, "\n")
                        update(model, family = f)
})
names(mod_list) <- fam
mod_list[["compois"]] <- model2

aictab <- (map_dfr(mod_list, glance, .id = "family")
     |> select(-c(nobs,logLik, BIC))
     |> mutate(across(c(AIC, deviance), ~ . - min(., na.rm = TRUE)))
     |> arrange(AIC)
)

##   family              sigma   AIC df.residual deviance
##   <chr>               <dbl> <dbl>       <int>    <dbl>
## 1 truncated_nbinom2 3.66e-8    0          569      NA
## 2 nbinom2           1.39e+0  665.         569       0
## 3 compois           5.29e+9  682.         569      NA
## 4 genpois           1.33e+1  768.         569      NA
## 5 nbinom1           8.11e+0  981.         569     191.
## 6 truncated_poisson 1   e+0 1996.         570      NA
## 7 poisson           1   e+0 2553.         570    2077.
## 8 truncated_nbinom1 7.84e+0   NA          569      NA

tnb <- mod_list[["truncated_nbinom2"]]
## Pearson resid calc messed up ???
plot(resid(tnb,type="pearson") ~ fitted(tnb))

plot(d$downs, predict(tnb, type = "response"), log = "xy")
abline(a=0, b=1, col = 2)

## reorder
mod_list <- mod_list[aictab$family]
op <- par(mfrow = c(3, 3), mfrow = c(3,2,1,1), las = 1)
lapply(mod_list,
        function(m) plot(resid(m,type="pearson") ~ fitted(m),
                         xlab = "", ylab = "", main = family(m)$family)
        )

## DHARMa
ss <- simulateResiduals(tnb)
plot(ss)
plotResiduals(ss, form = d$ELs/20)

On 2023-05-13 11:55 p.m., Timothy MacKenzie wrote:
> Greetings,
> 
> I'm modeling the number of downloads (downs) of certain forms
> (Document) by a number of school districts (District) of various
> student sizes (ELs).
> 
> I've tried genpois(), nbinom1(), and nbinom2() families but the fitted
> vs. residual doesn't look promising.
> 
> In glmmTMB, are there any additional ways to improve the residual
> distribution for count-type data?
> 
> Thank you,
> Tim M
> ### Reproducible data and code:
> d = read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
> 
> model = glmmTMB(downs ~ I(ELs/20) + Document +
>                             (1|District) +
>                             (1|Document),
>                           family=genpois(),
>                       #  ziformula = ~ Document,
>                    # dispformula = ~ Document,
>                           data = d)
> 
> plot(resid(model,type="pearson") ~ fitted(model))
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
 > E-mail is sent at my convenience; I don't expect replies outside of 
working hours.