[R-sig-ME] Improving residual distribution in glmmTMB
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Mon May 15 16:37:07 CEST 2023
I don't have any perfect solution, and I don't necessarily recommend
the brute-force approach tried below, but a truncated negative binomial
(2/quadratic parameterization) seems best. The genpois and nbinom1 are
actually wonkiest, in terms of appearance of residuals.
The COM-Poisson below is *very* slow, you might want to skip it (it
took 20 minutes to fit on 16 cores, and the results are not that great
anyway).
I thought about implementing a Poisson-inverse-Gamma to allow for
heavier tails, but haven't gotten around to it yet (it's not trivial ...)
There seems to be something wonky about the reported sigma/Pearson
residuals from truncated nbinom2, but the results are OK ...
### Reproducible data and code:
d <- read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
library(glmmTMB)
library(purrr)
library(broom.mixed)
library(dplyr)
library(DHARMa)
model <- glmmTMB(downs ~ I(ELs/20) +
## Document +
(1|District) +
(1|Document),
family=genpois(),
# ziformula = ~ Document,
# dispformula = ~ Document,
data = d)
## very slow ...
system.time(model2 <- update(model, family = "compois",
control = glmmTMBControl(parallel = 16)))
## user system elapsed
## 17194.076 11.103 1134.367
## truncated genpois doesn't work for some reason ...
fam <- c("genpois", "poisson", "nbinom1", "nbinom2")
fam <- c(fam, paste0("truncated_", fam[-1]))
mod_list <- lapply(fam,
function(f) {
cat(f, "\n")
update(model, family = f)
})
names(mod_list) <- fam
mod_list[["compois"]] <- model2
aictab <- (map_dfr(mod_list, glance, .id = "family")
|> select(-c(nobs,logLik, BIC))
|> mutate(across(c(AIC, deviance), ~ . - min(., na.rm = TRUE)))
|> arrange(AIC)
)
## family sigma AIC df.residual deviance
## <chr> <dbl> <dbl> <int> <dbl>
## 1 truncated_nbinom2 3.66e-8 0 569 NA
## 2 nbinom2 1.39e+0 665. 569 0
## 3 compois 5.29e+9 682. 569 NA
## 4 genpois 1.33e+1 768. 569 NA
## 5 nbinom1 8.11e+0 981. 569 191.
## 6 truncated_poisson 1 e+0 1996. 570 NA
## 7 poisson 1 e+0 2553. 570 2077.
## 8 truncated_nbinom1 7.84e+0 NA 569 NA
tnb <- mod_list[["truncated_nbinom2"]]
## Pearson resid calc messed up ???
plot(resid(tnb,type="pearson") ~ fitted(tnb))
plot(d$downs, predict(tnb, type = "response"), log = "xy")
abline(a=0, b=1, col = 2)
## reorder
mod_list <- mod_list[aictab$family]
op <- par(mfrow = c(3, 3), mfrow = c(3,2,1,1), las = 1)
lapply(mod_list,
function(m) plot(resid(m,type="pearson") ~ fitted(m),
xlab = "", ylab = "", main = family(m)$family)
)
## DHARMa
ss <- simulateResiduals(tnb)
plot(ss)
plotResiduals(ss, form = d$ELs/20)
On 2023-05-13 11:55 p.m., Timothy MacKenzie wrote:
> Greetings,
>
> I'm modeling the number of downloads (downs) of certain forms
> (Document) by a number of school districts (District) of various
> student sizes (ELs).
>
> I've tried genpois(), nbinom1(), and nbinom2() families but the fitted
> vs. residual doesn't look promising.
>
> In glmmTMB, are there any additional ways to improve the residual
> distribution for count-type data?
>
> Thank you,
> Tim M
> ### Reproducible data and code:
> d = read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
>
> model = glmmTMB(downs ~ I(ELs/20) + Document +
> (1|District) +
> (1|Document),
> family=genpois(),
> # ziformula = ~ Document,
> # dispformula = ~ Document,
> data = d)
>
> plot(resid(model,type="pearson") ~ fitted(model))
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of
working hours.
More information about the R-sig-mixed-models
mailing list