[R-sig-ME] Improving residual distribution in glmmTMB

Tue May 16 02:46:00 CEST 2023

I'm likely misunderstanding your recommendation for using predict()
for each Document. If our model has no fixed effect for Document, I'm
wondering how then we can predict() how often each document is
downloaded?

Do you mean averaging across predictions for each category of Document as in:

library(dplyr)
data.frame(d, fitted = predict(model, type = "response")) %>%
  group_by(Document) %>% mutate(ave_downs = mean(downs))

Tim M

model <- glmmTMB(downs ~ I(ELs/20) +
                    ## Document +
                   (1|District)+
                   (1|Document),
                 family=truncated_nbinom2(),
                 data = d)

On Mon, May 15, 2023 at 6:18 PM Ben Bolker <bbolker using gmail.com> wrote:
>
>    In general it doesn't make sense to include any term as both a fixed
> effect and a random-effect grouping variable (with a few exceptions).
>
>    There's no reason you can't examine the model predictions of how
> often each document is downloaded (i.e., use predict() with each
> document and whatever baseline covariate settings you want)
>
> On 2023-05-15 6:52 p.m., Timothy MacKenzie wrote:
> > Thank you, Ben, that's very helpful. I noticed that you removed the
> > fixed effect for "Document", that's, I'm guessing, to improve the
> > model fit (ex. avoiding singularity)? I was hoping to explore how many
> > times, on average, each unique Document was downloaded. Looks like
> > that may not be possible.
> >
> > Thanks,
> > Tim M
> >
> >
> >
> >
> >
> > On Mon, May 15, 2023 at 9:37 AM Ben Bolker <bbolker using gmail.com> wrote:
> >>
> >>     I don't have any perfect solution, and I don't necessarily recommend
> >> the brute-force approach tried below, but a truncated negative binomial
> >> (2/quadratic parameterization) seems best. The genpois and nbinom1 are
> >> actually wonkiest, in terms of appearance of residuals.
> >>     The COM-Poisson below is *very* slow, you might want to skip it (it
> >> took 20 minutes to fit on 16 cores, and the results are not that great
> >> anyway).
> >>     I thought about implementing a Poisson-inverse-Gamma to allow for
> >> heavier tails, but haven't gotten around to it yet (it's not trivial ...)
> >>     There seems to be something wonky about the reported sigma/Pearson
> >> residuals from truncated nbinom2, but the results are OK ...
> >>
> >> ### Reproducible data and code:
> >> d <- read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
> >> library(glmmTMB)
> >> library(purrr)
> >> library(broom.mixed)
> >> library(dplyr)
> >> library(DHARMa)
> >>
> >> model <- glmmTMB(downs ~ I(ELs/20) +
> >>                       ## Document +
> >>                       (1|District) +
> >>                       (1|Document),
> >>                   family=genpois(),
> >>                        #  ziformula = ~ Document,
> >>                     # dispformula = ~ Document,
> >>                   data = d)
> >>
> >> ## very slow ...
> >> system.time(model2 <- update(model, family = "compois",
> >>                                control = glmmTMBControl(parallel = 16)))
> >>
> >> ##      user    system   elapsed
> >> ## 17194.076    11.103  1134.367
> >>
> >> ## truncated genpois doesn't work for some reason ...
> >> fam <- c("genpois", "poisson", "nbinom1", "nbinom2")
> >> fam <- c(fam, paste0("truncated_", fam[-1]))
> >> mod_list <- lapply(fam,
> >>                      function(f) {
> >>                          cat(f, "\n")
> >>                          update(model, family = f)
> >> })
> >> names(mod_list) <- fam
> >> mod_list[["compois"]] <- model2
> >>
> >>
> >> aictab <- (map_dfr(mod_list, glance, .id = "family")
> >>       |> select(-c(nobs,logLik, BIC))
> >>       |> mutate(across(c(AIC, deviance), ~ . - min(., na.rm = TRUE)))
> >>       |> arrange(AIC)
> >> )
> >>
> >> ##   family              sigma   AIC df.residual deviance
> >> ##   <chr>               <dbl> <dbl>       <int>    <dbl>
> >> ## 1 truncated_nbinom2 3.66e-8    0          569      NA
> >> ## 2 nbinom2           1.39e+0  665.         569       0
> >> ## 3 compois           5.29e+9  682.         569      NA
> >> ## 4 genpois           1.33e+1  768.         569      NA
> >> ## 5 nbinom1           8.11e+0  981.         569     191.
> >> ## 6 truncated_poisson 1   e+0 1996.         570      NA
> >> ## 7 poisson           1   e+0 2553.         570    2077.
> >> ## 8 truncated_nbinom1 7.84e+0   NA          569      NA
> >>
> >> tnb <- mod_list[["truncated_nbinom2"]]
> >> ## Pearson resid calc messed up ???
> >> plot(resid(tnb,type="pearson") ~ fitted(tnb))
> >>
> >> plot(d$downs, predict(tnb, type = "response"), log = "xy")
> >> abline(a=0, b=1, col = 2)
> >>
> >> ## reorder
> >> mod_list <- mod_list[aictab$family]
> >> op <- par(mfrow = c(3, 3), mfrow = c(3,2,1,1), las = 1)
> >> lapply(mod_list,
> >>          function(m) plot(resid(m,type="pearson") ~ fitted(m),
> >>                           xlab = "", ylab = "", main = family(m)$family)
> >>          )
> >>
> >>
> >> ## DHARMa
> >> ss <- simulateResiduals(tnb)
> >> plot(ss)
> >> plotResiduals(ss, form = d$ELs/20)
> >>
> >>
> >>
> >> On 2023-05-13 11:55 p.m., Timothy MacKenzie wrote:
> >>> Greetings,
> >>>
> >>> I'm modeling the number of downloads (downs) of certain forms
> >>> (Document) by a number of school districts (District) of various
> >>> student sizes (ELs).
> >>>
> >>> I've tried genpois(), nbinom1(), and nbinom2() families but the fitted
> >>> vs. residual doesn't look promising.
> >>>
> >>> In glmmTMB, are there any additional ways to improve the residual
> >>> distribution for count-type data?
> >>>
> >>> Thank you,
> >>> Tim M
> >>> ### Reproducible data and code:
> >>> d = read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
> >>>
> >>> model = glmmTMB(downs ~ I(ELs/20) + Document +
> >>>                              (1|District) +
> >>>                              (1|Document),
> >>>                            family=genpois(),
> >>>                        #  ziformula = ~ Document,
> >>>                     # dispformula = ~ Document,
> >>>                            data = d)
> >>>
> >>> plot(resid(model,type="pearson") ~ fitted(model))
> >>>
> >>> _______________________________________________
> >>> R-sig-mixed-models using r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >> --
> >> Dr. Benjamin Bolker
> >> Professor, Mathematics & Statistics and Biology, McMaster University
> >> Director, School of Computational Science and Engineering
> >> (Acting) Graduate chair, Mathematics & Statistics
> >>   > E-mail is sent at my convenience; I don't expect replies outside of
> >> working hours.
> >>
> >> _______________________________________________
> >> R-sig-mixed-models using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
>  > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.