[R-sig-ME] Truncated Negative Binomial Model Unexpected Marginal Means

Wed Mar 2 06:23:48 CET 2022

If I'm understanding the problem correctly....

I believe emmeans will allow you to pass the reference grid over which
things are computed.

On 15/2/22 1:20 pm, Alex Waldman wrote:
> Thanks, Ben!
> 
> I've copied a response from Russ:
> 
> Support for emmeans for glmmTMB models is in the glmmTMB package. However, I believe those estimates are based on the linear predictor for the conditional model only. That is, these are estimates of the mean of the fitted negative binomial distribution before it is truncated. Perhaps Ben can confirm this.
> 
> I think estimated means for hurdle and zero-inflated models are very difficult to interpret without taking into account both parts of the model; and, unfortunately, it is a somewhat complicated matter to do so that is not currently implemented. Seems worthwhile to try, though. One would have to include all the factors that affect both models; then from the truncated part, estimate its mean M and its probability of zero PT0. And from the ZI part, estimate its probability of zero PZ0. Then the estimated mean of the response is M * (1 - PZ0) / (1 - PT0), unless I made some kind of stupid mistake. The tricky part is estimating the SE of this result.
> 
> In addition, here is some example data attached. I ran the following code:
> 
> data<-read.csv("Example.csv",header=T)
> as.factor(data$ID)->data$ID
> as.factor(data$Location)->data$Location
> as.factor(data$Type)->data$Type
> model<-glmmTMB(Count ~ LocationType + (1 | ID), zi=~LocationType + (1|ID), data=data, family="truncated_nbinom1",control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS")))
> 
> var.corr <-VarCorr(model)
> 
> Conditional model:
> Groups Name Std.Dev.
> ID (Intercept) 0.37105
> 
> Zero-inflation model:
> Groups Name Std.Dev.
> ID (Intercept) 1.3207
> 
> emmeans <- emmeans(model, ~ Location*Type, type="response", sigma=0.37105, bias.adjust=TRUE)
> 
> Location Type response SE df lower.CL upper.CL
> 0 0 1.117 0.277 631 0.687 1.82
> 1 0 0.940 0.251 631 0.556 1.59
> 2 0 0.893 0.266 631 0.498 1.60
> 0 1 1.325 0.254 631 0.909 1.93
> 1 1 1.090 0.248 631 0.698 1.70
> 2 1 1.452 0.300 631 0.967 2.18
> 
> Confidence level used: 0.95
> Intervals are back-transformed from the log scale
> Bias adjustment applied based on sigma = 0.37105
> 
> Interestingly, I tabulated the mean for location0/type1 including 0s and without 0s. Including 0s=1.05 and without 0s=1.88. The emmeans output seems to be in the middle of those two.
> 
> Does anyone have any workaround suggestions on how to tabulate the means of the conditional model above a threshold (ie above 0)? This page seemed to be relevant: https://www.theanalysisfactor.com/getting-accurate-predicted-counts-when-there-are-no-zeros-in-the-data/ but I wasn't sure what R package would fit my needs.
> 
> Thanks for your help!
> 
> Warm Regards,
> Alex 
> 
> On 2/15/22, 4:08 PM, "R-sig-mixed-models on behalf of Ben Bolker" <r-sig-mixed-models-bounces using r-project.org on behalf of bbolker using gmail.com> wrote:
> 
>     A quick test suggests that emmeans is predicting the response based on 
>     the mean of the *un*truncated distribution (I don't remember and/or 
>     haven't looked into all of the guts of emmeans).  Don't know if Russ 
>     Lenth (emmeans maintainer) is reading ...
> 
>     n <- 1000
>     dd <- data.frame(f = factor(rep(1:2, each = n)))
>     gb <- log(c(2,4))
>     set.seed(101)
>     dd <- transform(dd, y = rnbinom(2*n, mu = exp(gb[f]), size = 2))
>     dd2 <- subset(dd, y > 0)
> 
>     ## un-truncated means
>     aggregate(y ~ f, data = dd, FUN = mean)
>     ##   f        y
>     ## 1 1 2.047
>     ## 2 2 3.917
> 
>     ## truncated means
>     aggregate(y ~ f, data = dd2, FUN = mean)
>     ##   f        y
>     ## 1 1 2.781250
>     ## 2 2 4.446084
> 
>     library(glmmTMB)
>     library(emmeans)
>     m1 <- glmmTMB(y ~ f, family = truncated_nbinom2, data = dd2)
> 
>     ## doesn't match exactly but close to untruncated means
>     emmeans(m1, ~ f, type = "response")
>     ##  f response     SE   df lower.CL upper.CL
>     ##  1     2.15 0.0891 1614     1.99     2.34
>     ##  2     3.98 0.1262 1614     3.74     4.23
> 
>     ## matches means exactly
>     m2 <- glmmTMB(y ~ f, family = nbinom2, data = dd)
>     emmeans(m2, ~ f, type = "response")
>     ##  f response     SE   df lower.CL upper.CL
>     ##  1     2.05 0.0651 1997     1.92     2.18
>     ##  2     3.92 0.1094 1997     3.71     4.14
> 
> 
>     On 2/15/22 10:04 AM, Alex Waldman wrote:
>     > Dear All,
>     > 
>     > Hope all is well! This may be a naïve question but I am running a hurdle negative binomial model to look at the differences in counts of differing types in different locations. My major interest is the conditional model (ie when counts are above 0).
>     > 
>     > I run the following code:
>     > 
>     > model<-glmmTMB(Count ~ Location*Type + (1 | ID), zi=~Location*Type + (1|ID), data=data, family="truncated_nbinom1",control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS")))
>     > 
>     > var.corr <-VarCorr(model)
>     > 
>     > Conditional model:
>     > Groups Name        Std.Dev.
>     > ID     (Intercept) 0.37105
>     > 
>     > Zero-inflation model:
>     > Groups Name        Std.Dev.
>     > ID     (Intercept) 1.3207
>     > 
>     > emmeans <- emmeans(model, ~ Location*Type, type="response", sigma=0.37105, bias.adjust=TRUE)
>     > 
>     > Location Type response    SE  df lower.CL upper.CL
>     > 0     0             1.117 0.277 631    0.687     1.82
>     > 1     0             0.940 0.251 631    0.556     1.59
>     > 2     0             0.893 0.266 631    0.498     1.60
>     > 0     1             1.325 0.254 631    0.909     1.93
>     > 1     1             1.090 0.248 631    0.698     1.70
>     > 2     1             1.452 0.300 631    0.967     2.18
>     > 
>     > Confidence level used: 0.95
>     > Intervals are back-transformed from the log scale
>     > Bias adjustment applied based on sigma = 0.37105
>     > 
>     > However, I’m not sure why the estimated means and confidence intervals will include values below 1 in the conditional model as I anticipated these values would represent the average number of non-zero counts? Is there something I may be doing wrong or not understanding?
>     > 
>     > Thanks in advance for your help!
>     > 
>     > Warm Regards,
>     > Alex
>     > 
>     > 	[[alternative HTML version deleted]]
>     > 
>     > _______________________________________________
>     > R-sig-mixed-models using r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
>     -- 
>     Dr. Benjamin Bolker
>     Professor, Mathematics & Statistics and Biology, McMaster University
>     Director, School of Computational Science and Engineering
>     (Acting) Graduate chair, Mathematics & Statistics
> 
>     _______________________________________________
>     R-sig-mixed-models using r-project.org mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>