[R-sig-ME] Truncated Negative Binomial Model Unexpected Marginal Means

Tue Feb 15 17:06:58 CET 2022

A quick test suggests that emmeans is predicting the response based on 
the mean of the *un*truncated distribution (I don't remember and/or 
haven't looked into all of the guts of emmeans).  Don't know if Russ 
Lenth (emmeans maintainer) is reading ...

n <- 1000
dd <- data.frame(f = factor(rep(1:2, each = n)))
gb <- log(c(2,4))
set.seed(101)
dd <- transform(dd, y = rnbinom(2*n, mu = exp(gb[f]), size = 2))
dd2 <- subset(dd, y > 0)

## un-truncated means
aggregate(y ~ f, data = dd, FUN = mean)
##   f        y
## 1 1 2.047
## 2 2 3.917

## truncated means
aggregate(y ~ f, data = dd2, FUN = mean)
##   f        y
## 1 1 2.781250
## 2 2 4.446084

library(glmmTMB)
library(emmeans)
m1 <- glmmTMB(y ~ f, family = truncated_nbinom2, data = dd2)

## doesn't match exactly but close to untruncated means
emmeans(m1, ~ f, type = "response")
##  f response     SE   df lower.CL upper.CL
##  1     2.15 0.0891 1614     1.99     2.34
##  2     3.98 0.1262 1614     3.74     4.23

## matches means exactly
m2 <- glmmTMB(y ~ f, family = nbinom2, data = dd)
emmeans(m2, ~ f, type = "response")
##  f response     SE   df lower.CL upper.CL
##  1     2.05 0.0651 1997     1.92     2.18
##  2     3.92 0.1094 1997     3.71     4.14

On 2/15/22 10:04 AM, Alex Waldman wrote:
> Dear All,
> 
> Hope all is well! This may be a naïve question but I am running a hurdle negative binomial model to look at the differences in counts of differing types in different locations. My major interest is the conditional model (ie when counts are above 0).
> 
> I run the following code:
> 
> model<-glmmTMB(Count ~ Location*Type + (1 | ID), zi=~Location*Type + (1|ID), data=data, family="truncated_nbinom1",control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS")))
> 
> var.corr <-VarCorr(model)
> 
> Conditional model:
> Groups Name        Std.Dev.
> ID     (Intercept) 0.37105
> 
> Zero-inflation model:
> Groups Name        Std.Dev.
> ID     (Intercept) 1.3207
> 
> emmeans <- emmeans(model, ~ Location*Type, type="response", sigma=0.37105, bias.adjust=TRUE)
> 
> Location Type response    SE  df lower.CL upper.CL
> 0     0             1.117 0.277 631    0.687     1.82
> 1     0             0.940 0.251 631    0.556     1.59
> 2     0             0.893 0.266 631    0.498     1.60
> 0     1             1.325 0.254 631    0.909     1.93
> 1     1             1.090 0.248 631    0.698     1.70
> 2     1             1.452 0.300 631    0.967     2.18
> 
> Confidence level used: 0.95
> Intervals are back-transformed from the log scale
> Bias adjustment applied based on sigma = 0.37105
> 
> However, I’m not sure why the estimated means and confidence intervals will include values below 1 in the conditional model as I anticipated these values would represent the average number of non-zero counts? Is there something I may be doing wrong or not understanding?
> 
> Thanks in advance for your help!
> 
> Warm Regards,
> Alex
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics