[R-sig-ME] Improving residual distribution in glmmTMB

Tue May 16 00:52:55 CEST 2023

Thank you, Ben, that's very helpful. I noticed that you removed the
fixed effect for "Document", that's, I'm guessing, to improve the
model fit (ex. avoiding singularity)? I was hoping to explore how many
times, on average, each unique Document was downloaded. Looks like
that may not be possible.

Thanks,
Tim M

On Mon, May 15, 2023 at 9:37 AM Ben Bolker <bbolker using gmail.com> wrote:
>
>    I don't have any perfect solution, and I don't necessarily recommend
> the brute-force approach tried below, but a truncated negative binomial
> (2/quadratic parameterization) seems best. The genpois and nbinom1 are
> actually wonkiest, in terms of appearance of residuals.
>    The COM-Poisson below is *very* slow, you might want to skip it (it
> took 20 minutes to fit on 16 cores, and the results are not that great
> anyway).
>    I thought about implementing a Poisson-inverse-Gamma to allow for
> heavier tails, but haven't gotten around to it yet (it's not trivial ...)
>    There seems to be something wonky about the reported sigma/Pearson
> residuals from truncated nbinom2, but the results are OK ...
>
> ### Reproducible data and code:
> d <- read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
> library(glmmTMB)
> library(purrr)
> library(broom.mixed)
> library(dplyr)
> library(DHARMa)
>
> model <- glmmTMB(downs ~ I(ELs/20) +
>                      ## Document +
>                      (1|District) +
>                      (1|Document),
>                  family=genpois(),
>                       #  ziformula = ~ Document,
>                    # dispformula = ~ Document,
>                  data = d)
>
> ## very slow ...
> system.time(model2 <- update(model, family = "compois",
>                               control = glmmTMBControl(parallel = 16)))
>
> ##      user    system   elapsed
> ## 17194.076    11.103  1134.367
>
> ## truncated genpois doesn't work for some reason ...
> fam <- c("genpois", "poisson", "nbinom1", "nbinom2")
> fam <- c(fam, paste0("truncated_", fam[-1]))
> mod_list <- lapply(fam,
>                     function(f) {
>                         cat(f, "\n")
>                         update(model, family = f)
> })
> names(mod_list) <- fam
> mod_list[["compois"]] <- model2
>
>
> aictab <- (map_dfr(mod_list, glance, .id = "family")
>      |> select(-c(nobs,logLik, BIC))
>      |> mutate(across(c(AIC, deviance), ~ . - min(., na.rm = TRUE)))
>      |> arrange(AIC)
> )
>
> ##   family              sigma   AIC df.residual deviance
> ##   <chr>               <dbl> <dbl>       <int>    <dbl>
> ## 1 truncated_nbinom2 3.66e-8    0          569      NA
> ## 2 nbinom2           1.39e+0  665.         569       0
> ## 3 compois           5.29e+9  682.         569      NA
> ## 4 genpois           1.33e+1  768.         569      NA
> ## 5 nbinom1           8.11e+0  981.         569     191.
> ## 6 truncated_poisson 1   e+0 1996.         570      NA
> ## 7 poisson           1   e+0 2553.         570    2077.
> ## 8 truncated_nbinom1 7.84e+0   NA          569      NA
>
> tnb <- mod_list[["truncated_nbinom2"]]
> ## Pearson resid calc messed up ???
> plot(resid(tnb,type="pearson") ~ fitted(tnb))
>
> plot(d$downs, predict(tnb, type = "response"), log = "xy")
> abline(a=0, b=1, col = 2)
>
> ## reorder
> mod_list <- mod_list[aictab$family]
> op <- par(mfrow = c(3, 3), mfrow = c(3,2,1,1), las = 1)
> lapply(mod_list,
>         function(m) plot(resid(m,type="pearson") ~ fitted(m),
>                          xlab = "", ylab = "", main = family(m)$family)
>         )
>
>
> ## DHARMa
> ss <- simulateResiduals(tnb)
> plot(ss)
> plotResiduals(ss, form = d$ELs/20)
>
>
>
> On 2023-05-13 11:55 p.m., Timothy MacKenzie wrote:
> > Greetings,
> >
> > I'm modeling the number of downloads (downs) of certain forms
> > (Document) by a number of school districts (District) of various
> > student sizes (ELs).
> >
> > I've tried genpois(), nbinom1(), and nbinom2() families but the fitted
> > vs. residual doesn't look promising.
> >
> > In glmmTMB, are there any additional ways to improve the residual
> > distribution for count-type data?
> >
> > Thank you,
> > Tim M
> > ### Reproducible data and code:
> > d = read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
> >
> > model = glmmTMB(downs ~ I(ELs/20) + Document +
> >                             (1|District) +
> >                             (1|Document),
> >                           family=genpois(),
> >                       #  ziformula = ~ Document,
> >                    # dispformula = ~ Document,
> >                           data = d)
> >
> > plot(resid(model,type="pearson") ~ fitted(model))
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
>  > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models