[R-sig-ME] Help: Interpreting unusual model fit results from generalized linear mixed model (glmmTMB/sjPlot)
Paul Johnson
p@u|@john@on @end|ng |rom g|@@gow@@c@uk
Fri Feb 26 15:21:55 CET 2021
Hi Andre,
I'm not very familiar with sjPlot, so these are partially guesses...
On the technical question... I think the ICC in tab_model doesn't include fixed effects, so for your model it's the ID variance divided by the total (non-fixed) variance, which is the ID variance plus the distribution-specific variance. ICC = 1 and conditional R2 = 1, therefore the distribution-specific variance must be zero, or at most tiny relative to the ID variance. I'm not sure how tab_model calculates the distribution-specific variance for negative binomial, or how it accounts for zero-truncation (or zero-inflation), but digging into it a bit, it looks like it's log(1 + 1/lambda + 1/theta), where lambda is the mean and theta the dispersion parameter of the negative binomial (see the help for insight::get_variance, and Table 1 and Appendix S1 of Nakagawa et al.*). This will be close to zero when both parameters are very large. A very large theta implies that the distribution is not overdispersed relative to Poisson (it could of course be underdispersed). A very large lambda implies that the predicted mean of the (untruncated I think) NB distribution is large. If both of these match what you see in your model fit, that might help to explain what you're seeing. You could also calculate the random effect variance to compare with the distribution specific variance, but as you have random slopes this isn't straightforward.
Stepping back a bit from the technical question... tab_model outputs all of these numbers by default but which ones are actually useful to you? I guess the R2 value could be useful -- it's telling you that the variance of the fixed effects is about a tenth of the variance of the random effect, i.e. there's a lot of unexplained variation at the ID level. Interpretation is complicated though by the fact that the RE variance contains variation in one of the fixed effects, the time effect. Perhaps ICC and R2 just aren't useful here?
Stepping back a bit further, it looks like your response variable is capped at 28-31 active days per month depending on the month. Depending on how frequently the participants gamble, I can see why there might not be a great deal of overdispersion. I also wonder why the distribution needs to be zero-truncated. Did every participant gamble for at least one day per month? Or do you only get data for a participant when they gamble for at least one day? In which case shouldn't you create zeroes in the other months for these participants? Does NB fit well, or would a beta distribution work better, where the response is proportion of gambling days each month (with a small buffer to keep the extremes away from 0 and 1)? It'll depend on the distribution, I can also imagine why a count distribution might work here.
Hope that helps,
Paul
*Nakagawa et al. (2017). The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of The Royal Society Interface, 14(134), 20170213. doi: 10.1098/rsif.2017.0213
On 26/02/2021, 10:22, "R-sig-mixed-models on behalf of Andre Syvertsen" <r-sig-mixed-models-bounces using r-project.org on behalf of Andre.Syvertsen using uib.no> wrote:
Hi,
I am working with a large dataset that contains longitudinal data on gambling behavior of 184,113 participants. The data is based on complete tracking of electronic gambling behavior within a gambling operator. Gambling behavior data is aggregated on a monthly level, a total of 70 months. I have an ID variable separating participants, a time variable (months), as well as numerous gambling behavior variables such as active days played for given month, bets placed for given month, total losses for given month, etc. I am investigating the role of age and gender in predicting active days gambling per month.
I have fitted a model with glmmTMB (see below for model code) and outputed the resulting statistics with sjPlot's tab_model function which I am having trouble interpreting. The full results can be found below. Notably, I appear to have gotten perfect intra-class correlation. While, I am sure variance in outcome responses (active days gambling) are likely to be heavily associated with subject and time, this seems excessive. Furthermore, the pseudo R2 suggests that 8.7% of the variance should be attributable to fixed effects which I would think would lower the variance attributable to individual/time? Are the results affected by the high number of observations, individuals and/or time points? Or maybe I have specified my model in an odd manner?
glmmTMB code for the model:
DaysPlayedConditionalAgeGenderTruncated <- glmmTMB(daysPlayed ~ 1 + time + ageCategory * gender + (time | id), dfLong, family = truncated_nbinom2)
Model summary:
Active Gambling Days Monthly
Predictors
Incidence Rate Ratios
CI
p
(Intercept)
1.18
1.16 �C 1.20
<0.001
Time
0.99
0.99 �C 0.99
<0.001
Age Category 30-39
1.29
1.26 �C 1.32
<0.001
Age Category 40-49
1.81
1.78 �C 1.85
<0.001
Age Category 50-59
2.47
2.41 �C 2.53
<0.001
Age Category 60-69
3.08
2.99 �C 3.17
<0.001
Age Category 70+
3.42
3.29 �C 3.56
<0.001
Gender: Women
1.69
1.65 �C 1.74
<0.001
Age Category 30-39:Women
0.90
0.86 �C 0.94
<0.001
Age Category 40-49:Women
0.67
0.65 �C 0.70
<0.001
Age Category 50-59:Women
0.53
0.50 �C 0.55
<0.001
Age Category 60-69: Women
0.46
0.44 �C 0.48
<0.001
Age Category 70+:Women
0.45
0.43 �C 0.48
<0.001
Random Effects
��2
0.00
��00 id
1.63
��11 id.time
0.00
��01 id
-0.38
ICC
1.00
N id
184113
Observations
3231544
Marginal R2 / Conditional R2
0.087 / 1.000
Note. Intercept = Men, age 18-29 years, first time point (month 0 of 69).
Kind regards,
Andr��
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list