[R-sig-ME] predict() expected counts for glmmTMB hurdle negative binomial model

Mon Sep 10 22:04:00 CEST 2018

Hi all,

I am modelling bighorn sheep habitat use using the glmmTMB package with a
hurdle negative binomial model. I understand that a hurdle (or
zero-altered, as same call it) negative binomial model exists as two parts
in sequence. First, I modeled whether habitat is used (coded with '1') or
not used (coded with '0') using a binomial distribution. This first piece
is a logistic regression with a logit link. Secondly, I modeled intensity
of habitat use by considering the conditional distribution, which is
truncated to only include positive counts. I modeled the conditional
distribution using a negative binomial distribution with a log link.

I am hoping to calculate predicted probability of habitat use from the
first portion of the model (the logistic regression) and calculate expected
counts from the truncated negative binomial portion of the model. Then, I
will use these 'expected' results in k-fold cross validation to determine
the predictive ability of my top model(s).

Here is an example of the code for my top model using the glmmTMB package:

model17 <- glmmTMB(bighorn.sheep.sum.rounded ~ pedge + rtp + dimroad + (1 |
year),
                         zi = ~ dihwall + pedge + rtp + dingra + dihroad +
dilake + dimroad + (1 | year),
                         family=list(family="truncated_nbinom2",
link="log"),data=bighorn.k)

I have figured out that specifying "zprob" in predict() returns the
probability of a '0', or probability of failure. Thus, (1-p) is the
probability of a '1', or probability of success, and ultimately,
probability of habitat use (what I want!).

bighorn.k$zprob <- predict(model17, newdata = testdata, type="zprob") #
gives p
bighorn.k$probability.habitat.use <- 1 - bighorn.k$zprob # can calculate
1-p in this way

However, I am still stuck trying to calculate expected counts given the
second portion of the zero-altered model, the truncated conditional
distribution. I can specify either "conditional" or "response" to calculate
mu, or (1-p)*mu, respectively. I can also double check the output for
"response".

bighorn.k$conditional <- predict(model17, newdata = testdata,
type="conditional") # mu
bighorn.k$response <- predict(model17, newdata = testdata, type="response")
# (1-p)*mu
# double check below
bighorn.k$probability.habitat.use * bighorn.k$conditional # equals
bighorn.k$response

And, all of this makes sense with this equation for hurdle models:

E[y|x] = [ 1 - f1(0|x) / 1 - f2(0|x) ] * mu(x); where f1(0|x) = p as I have
described above

>From what I understand, the only piece that I am missing is f2(0|x), which
is the probability of a non-zero in the second untruncated process.
Alternatively, if I can calculate the predicted ratio of probabilities for
observing a non-zero count, I can multiply by mu and get my expected
counts. How do I get either f2(0|x) or the entire ratio from predict(),
using glmmTMB models? Am I missing something or not interpreting something
correctly?

Thanks,
Meghan Beale
University of Alberta

	[[alternative HTML version deleted]]