[R-sig-ME] Truncated Negative Binomial Model Unexpected Marginal Means

Tue Feb 15 20:20:02 CET 2022

Thanks, Ben!

I've copied a response from Russ:

Support for emmeans for glmmTMB models is in the glmmTMB package. However, I believe those estimates are based on the linear predictor for the conditional model only. That is, these are estimates of the mean of the fitted negative binomial distribution before it is truncated. Perhaps Ben can confirm this.

I think estimated means for hurdle and zero-inflated models are very difficult to interpret without taking into account both parts of the model; and, unfortunately, it is a somewhat complicated matter to do so that is not currently implemented. Seems worthwhile to try, though. One would have to include all the factors that affect both models; then from the truncated part, estimate its mean M and its probability of zero PT0. And from the ZI part, estimate its probability of zero PZ0. Then the estimated mean of the response is M * (1 - PZ0) / (1 - PT0), unless I made some kind of stupid mistake. The tricky part is estimating the SE of this result.

In addition, here is some example data attached. I ran the following code:

data<-read.csv("Example.csv",header=T)
as.factor(data$ID)->data$ID
as.factor(data$Location)->data$Location
as.factor(data$Type)->data$Type
model<-glmmTMB(Count ~ LocationType + (1 | ID), zi=~LocationType + (1|ID), data=data, family="truncated_nbinom1",control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS")))

var.corr <-VarCorr(model)

Conditional model:
Groups Name Std.Dev.
ID (Intercept) 0.37105

Zero-inflation model:
Groups Name Std.Dev.
ID (Intercept) 1.3207

emmeans <- emmeans(model, ~ Location*Type, type="response", sigma=0.37105, bias.adjust=TRUE)

Location Type response SE df lower.CL upper.CL
0 0 1.117 0.277 631 0.687 1.82
1 0 0.940 0.251 631 0.556 1.59
2 0 0.893 0.266 631 0.498 1.60
0 1 1.325 0.254 631 0.909 1.93
1 1 1.090 0.248 631 0.698 1.70
2 1 1.452 0.300 631 0.967 2.18

Confidence level used: 0.95
Intervals are back-transformed from the log scale
Bias adjustment applied based on sigma = 0.37105

Interestingly, I tabulated the mean for location0/type1 including 0s and without 0s. Including 0s=1.05 and without 0s=1.88. The emmeans output seems to be in the middle of those two.

Does anyone have any workaround suggestions on how to tabulate the means of the conditional model above a threshold (ie above 0)? This page seemed to be relevant: https://www.theanalysisfactor.com/getting-accurate-predicted-counts-when-there-are-no-zeros-in-the-data/ but I wasn't sure what R package would fit my needs.

Thanks for your help!

Warm Regards,
Alex 

On 2/15/22, 4:08 PM, "R-sig-mixed-models on behalf of Ben Bolker" <r-sig-mixed-models-bounces using r-project.org on behalf of bbolker using gmail.com> wrote:

    A quick test suggests that emmeans is predicting the response based on 
    the mean of the *un*truncated distribution (I don't remember and/or 
    haven't looked into all of the guts of emmeans).  Don't know if Russ 
    Lenth (emmeans maintainer) is reading ...

    n <- 1000
    dd <- data.frame(f = factor(rep(1:2, each = n)))
    gb <- log(c(2,4))
    set.seed(101)
    dd <- transform(dd, y = rnbinom(2*n, mu = exp(gb[f]), size = 2))
    dd2 <- subset(dd, y > 0)

    ## un-truncated means
    aggregate(y ~ f, data = dd, FUN = mean)
    ##   f        y
    ## 1 1 2.047
    ## 2 2 3.917

    ## truncated means
    aggregate(y ~ f, data = dd2, FUN = mean)
    ##   f        y
    ## 1 1 2.781250
    ## 2 2 4.446084

    library(glmmTMB)
    library(emmeans)
    m1 <- glmmTMB(y ~ f, family = truncated_nbinom2, data = dd2)

    ## doesn't match exactly but close to untruncated means
    emmeans(m1, ~ f, type = "response")
    ##  f response     SE   df lower.CL upper.CL
    ##  1     2.15 0.0891 1614     1.99     2.34
    ##  2     3.98 0.1262 1614     3.74     4.23

    ## matches means exactly
    m2 <- glmmTMB(y ~ f, family = nbinom2, data = dd)
    emmeans(m2, ~ f, type = "response")
    ##  f response     SE   df lower.CL upper.CL
    ##  1     2.05 0.0651 1997     1.92     2.18
    ##  2     3.92 0.1094 1997     3.71     4.14

    On 2/15/22 10:04 AM, Alex Waldman wrote:
    > Dear All,
    > 
    > Hope all is well! This may be a naïve question but I am running a hurdle negative binomial model to look at the differences in counts of differing types in different locations. My major interest is the conditional model (ie when counts are above 0).
    > 
    > I run the following code:
    > 
    > model<-glmmTMB(Count ~ Location*Type + (1 | ID), zi=~Location*Type + (1|ID), data=data, family="truncated_nbinom1",control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS")))
    > 
    > var.corr <-VarCorr(model)
    > 
    > Conditional model:
    > Groups Name        Std.Dev.
    > ID     (Intercept) 0.37105
    > 
    > Zero-inflation model:
    > Groups Name        Std.Dev.
    > ID     (Intercept) 1.3207
    > 
    > emmeans <- emmeans(model, ~ Location*Type, type="response", sigma=0.37105, bias.adjust=TRUE)
    > 
    > Location Type response    SE  df lower.CL upper.CL
    > 0     0             1.117 0.277 631    0.687     1.82
    > 1     0             0.940 0.251 631    0.556     1.59
    > 2     0             0.893 0.266 631    0.498     1.60
    > 0     1             1.325 0.254 631    0.909     1.93
    > 1     1             1.090 0.248 631    0.698     1.70
    > 2     1             1.452 0.300 631    0.967     2.18
    > 
    > Confidence level used: 0.95
    > Intervals are back-transformed from the log scale
    > Bias adjustment applied based on sigma = 0.37105
    > 
    > However, I’m not sure why the estimated means and confidence intervals will include values below 1 in the conditional model as I anticipated these values would represent the average number of non-zero counts? Is there something I may be doing wrong or not understanding?
    > 
    > Thanks in advance for your help!
    > 
    > Warm Regards,
    > Alex
    > 
    > 	[[alternative HTML version deleted]]
    > 
    > _______________________________________________
    > R-sig-mixed-models using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

    -- 
    Dr. Benjamin Bolker
    Professor, Mathematics & Statistics and Biology, McMaster University
    Director, School of Computational Science and Engineering
    (Acting) Graduate chair, Mathematics & Statistics

    _______________________________________________
    R-sig-mixed-models using r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models