[R-sig-ME] glmmTMB: biased marginal predictions in zero-inflated negative binomial mixed models

Sun Feb 16 15:48:05 CET 2020

I have used zero-inflated negative binomial models for modeling tree
ingrowth in forests. The data set contains 1255 observations from 850
sample plots, each plot having 1 or 2 measurements. The purpose is to use
the models outside the estimation data, so the marginal prediction is of
the main interest. I’m using glmmTMB with nbinom2 family and maximum
likelihood estimation.  When I estimate models without random effects, I
get unbiased marginal predictions and the estimated right-censored
distribution agrees exactly with the empirical distribution. There is very
high overdispersion, the overdispersion parameter is around 0.4.

When I add random plot effects to the conditional model, there are two
problems. If I just add random plot effect to the optimal fixed-effects
model, the estimation does not converge. Using the parameter values
obtained from the fixed-effect estimation as starting values, the
estimation converges. A reasonable looking model with higher likelihood is
obtained and all predictors in the conditional model remain significant and
all predictors except one remain significant also in the zero-inflation
model. The overdispersion parameter increases to around 4.  When I then
drop all predictors from the zero-inflation part except the intercept, the
likelihood increases even if it should decrease.  Starting from the
intercept-only model one can obtain quite simple model which has higher
likelihood than more complicated models where all predictors are anyhow
very significant. The simpler models have larger variance for the plot
effects and larger over dispersion parameter (i.e. smaller overdispersion),
as seems logical. Anyhow, it seems that better fitting models with more
parameters do not have higher likelihoods as they should.

Second problem is related to the marginal predictor. I have considered two
predictors:

(1-1/(1+exp(-linear zi-predictor))*exp(linear predictor for the conditional
model)

In the second predictor 0.5*var(random plot effect) is added to the linear
predictor of the conditional model. The first predictor is biased
downwards, as it should. But the second predictor is biased upwards. The
bias is worse for the simpler models with higher likelihoods. For instance,
the average pine ingrowth count is 0.9, the average prediction with the
optimal simple model is 4.2 and with a more complicated model with lower
likelihood the average is 1.7. The standard deviations of residuals are
also better in fixed-effect models than in the mixed effects models with
higher likelihoods.

I regard that the strange behavior in mixed models is caused by the fact
that the random effects have very non-normal distribution. Do you have
better explanations?

Juha

	[[alternative HTML version deleted]]