[R-sig-ME] Improving residual distribution in glmmTMB

Tue May 16 03:45:59 CEST 2023

Many thanks, I wasn't aware of this. Looking forward to one day trying
the same data with the family=Poisson-inverse-Gamma().

Much appreciated,
Tim M

On Mon, May 15, 2023 at 7:52 PM Ben Bolker <bbolker using gmail.com> wrote:
>
>    By default (the re.form argument to predict() is set to NULL), the
> predictions include the random effects.  So a 'newdata' data frame like:
>
> ELs   Document  District
> 20    1         NA
> 20    2         NA
> 20    3         NA
>
> would give you the predicted values for document 1, 2, 3, given ELs =
> 20, and making a population-level prediction for District (i.e. *not*
> using the District RE/setting this info to 0)
>
> On 2023-05-15 8:46 p.m., Timothy MacKenzie wrote:
> > I'm likely misunderstanding your recommendation for using predict()
> > for each Document. If our model has no fixed effect for Document, I'm
> > wondering how then we can predict() how often each document is
> > downloaded?
> >
> > Do you mean averaging across predictions for each category of Document as in:
> >
> > library(dplyr)
> > data.frame(d, fitted = predict(model, type = "response")) %>%
> >    group_by(Document) %>% mutate(ave_downs = mean(downs))
> >
> > Tim M
> >
> > model <- glmmTMB(downs ~ I(ELs/20) +
> >                      ## Document +
> >                     (1|District)+
> >                     (1|Document),
> >                   family=truncated_nbinom2(),
> >                   data = d)
> >
> > On Mon, May 15, 2023 at 6:18 PM Ben Bolker <bbolker using gmail.com> wrote:
> >>
> >>     In general it doesn't make sense to include any term as both a fixed
> >> effect and a random-effect grouping variable (with a few exceptions).
> >>
> >>     There's no reason you can't examine the model predictions of how
> >> often each document is downloaded (i.e., use predict() with each
> >> document and whatever baseline covariate settings you want)
> >>
> >> On 2023-05-15 6:52 p.m., Timothy MacKenzie wrote:
> >>> Thank you, Ben, that's very helpful. I noticed that you removed the
> >>> fixed effect for "Document", that's, I'm guessing, to improve the
> >>> model fit (ex. avoiding singularity)? I was hoping to explore how many
> >>> times, on average, each unique Document was downloaded. Looks like
> >>> that may not be possible.
> >>>
> >>> Thanks,
> >>> Tim M
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, May 15, 2023 at 9:37 AM Ben Bolker <bbolker using gmail.com> wrote:
> >>>>
> >>>>      I don't have any perfect solution, and I don't necessarily recommend
> >>>> the brute-force approach tried below, but a truncated negative binomial
> >>>> (2/quadratic parameterization) seems best. The genpois and nbinom1 are
> >>>> actually wonkiest, in terms of appearance of residuals.
> >>>>      The COM-Poisson below is *very* slow, you might want to skip it (it
> >>>> took 20 minutes to fit on 16 cores, and the results are not that great
> >>>> anyway).
> >>>>      I thought about implementing a Poisson-inverse-Gamma to allow for
> >>>> heavier tails, but haven't gotten around to it yet (it's not trivial ...)
> >>>>      There seems to be something wonky about the reported sigma/Pearson
> >>>> residuals from truncated nbinom2, but the results are OK ...
> >>>>
> >>>> ### Reproducible data and code:
> >>>> d <- read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
> >>>> library(glmmTMB)
> >>>> library(purrr)
> >>>> library(broom.mixed)
> >>>> library(dplyr)
> >>>> library(DHARMa)
> >>>>
> >>>> model <- glmmTMB(downs ~ I(ELs/20) +
> >>>>                        ## Document +
> >>>>                        (1|District) +
> >>>>                        (1|Document),
> >>>>                    family=genpois(),
> >>>>                         #  ziformula = ~ Document,
> >>>>                      # dispformula = ~ Document,
> >>>>                    data = d)
> >>>>
> >>>> ## very slow ...
> >>>> system.time(model2 <- update(model, family = "compois",
> >>>>                                 control = glmmTMBControl(parallel = 16)))
> >>>>
> >>>> ##      user    system   elapsed
> >>>> ## 17194.076    11.103  1134.367
> >>>>
> >>>> ## truncated genpois doesn't work for some reason ...
> >>>> fam <- c("genpois", "poisson", "nbinom1", "nbinom2")
> >>>> fam <- c(fam, paste0("truncated_", fam[-1]))
> >>>> mod_list <- lapply(fam,
> >>>>                       function(f) {
> >>>>                           cat(f, "\n")
> >>>>                           update(model, family = f)
> >>>> })
> >>>> names(mod_list) <- fam
> >>>> mod_list[["compois"]] <- model2
> >>>>
> >>>>
> >>>> aictab <- (map_dfr(mod_list, glance, .id = "family")
> >>>>        |> select(-c(nobs,logLik, BIC))
> >>>>        |> mutate(across(c(AIC, deviance), ~ . - min(., na.rm = TRUE)))
> >>>>        |> arrange(AIC)
> >>>> )
> >>>>
> >>>> ##   family              sigma   AIC df.residual deviance
> >>>> ##   <chr>               <dbl> <dbl>       <int>    <dbl>
> >>>> ## 1 truncated_nbinom2 3.66e-8    0          569      NA
> >>>> ## 2 nbinom2           1.39e+0  665.         569       0
> >>>> ## 3 compois           5.29e+9  682.         569      NA
> >>>> ## 4 genpois           1.33e+1  768.         569      NA
> >>>> ## 5 nbinom1           8.11e+0  981.         569     191.
> >>>> ## 6 truncated_poisson 1   e+0 1996.         570      NA
> >>>> ## 7 poisson           1   e+0 2553.         570    2077.
> >>>> ## 8 truncated_nbinom1 7.84e+0   NA          569      NA
> >>>>
> >>>> tnb <- mod_list[["truncated_nbinom2"]]
> >>>> ## Pearson resid calc messed up ???
> >>>> plot(resid(tnb,type="pearson") ~ fitted(tnb))
> >>>>
> >>>> plot(d$downs, predict(tnb, type = "response"), log = "xy")
> >>>> abline(a=0, b=1, col = 2)
> >>>>
> >>>> ## reorder
> >>>> mod_list <- mod_list[aictab$family]
> >>>> op <- par(mfrow = c(3, 3), mfrow = c(3,2,1,1), las = 1)
> >>>> lapply(mod_list,
> >>>>           function(m) plot(resid(m,type="pearson") ~ fitted(m),
> >>>>                            xlab = "", ylab = "", main = family(m)$family)
> >>>>           )
> >>>>
> >>>>
> >>>> ## DHARMa
> >>>> ss <- simulateResiduals(tnb)
> >>>> plot(ss)
> >>>> plotResiduals(ss, form = d$ELs/20)
> >>>>
> >>>>
> >>>>
> >>>> On 2023-05-13 11:55 p.m., Timothy MacKenzie wrote:
> >>>>> Greetings,
> >>>>>
> >>>>> I'm modeling the number of downloads (downs) of certain forms
> >>>>> (Document) by a number of school districts (District) of various
> >>>>> student sizes (ELs).
> >>>>>
> >>>>> I've tried genpois(), nbinom1(), and nbinom2() families but the fitted
> >>>>> vs. residual doesn't look promising.
> >>>>>
> >>>>> In glmmTMB, are there any additional ways to improve the residual
> >>>>> distribution for count-type data?
> >>>>>
> >>>>> Thank you,
> >>>>> Tim M
> >>>>> ### Reproducible data and code:
> >>>>> d = read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
> >>>>>
> >>>>> model = glmmTMB(downs ~ I(ELs/20) + Document +
> >>>>>                               (1|District) +
> >>>>>                               (1|Document),
> >>>>>                             family=genpois(),
> >>>>>                         #  ziformula = ~ Document,
> >>>>>                      # dispformula = ~ Document,
> >>>>>                             data = d)
> >>>>>
> >>>>> plot(resid(model,type="pearson") ~ fitted(model))
> >>>>>
> >>>>> _______________________________________________
> >>>>> R-sig-mixed-models using r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>
> >>>> --
> >>>> Dr. Benjamin Bolker
> >>>> Professor, Mathematics & Statistics and Biology, McMaster University
> >>>> Director, School of Computational Science and Engineering
> >>>> (Acting) Graduate chair, Mathematics & Statistics
> >>>>    > E-mail is sent at my convenience; I don't expect replies outside of
> >>>> working hours.
> >>>>
> >>>> _______________________________________________
> >>>> R-sig-mixed-models using r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >> --
> >> Dr. Benjamin Bolker
> >> Professor, Mathematics & Statistics and Biology, McMaster University
> >> Director, School of Computational Science and Engineering
> >> (Acting) Graduate chair, Mathematics & Statistics
> >>   > E-mail is sent at my convenience; I don't expect replies outside of
> >> working hours.
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
>  > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.