[R-sig-ME] Mixed effects model with many zeros

Fri Jun 5 00:35:49 CEST 2020

No, to me  that does not seem like a reasonable way to analyze the data you describe, although I'm coming from a very different field so I could be off base.  The zero inflation and skew are artifacts of treating ordinal categories as numeric, and category "0" as numeric 0.

Is there a reason you don't want to model your ordered categorical response as ordered categorical?  There could be something about patterns across the 10 questions I'm missing, but for what you describe, perhaps ordinal::clmm() or something in the mixor package would do what you need.  

Note that glmmTMB is unlikely to add ordinal responses:
https://github.com/glmmTMB/glmmTMB/issues/514

Tom

-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> On Behalf Of Austen Anderson via R-sig-mixed-models
Sent: Thursday, June 4, 2020 2:49 PM
To: r-sig-mixed-models using r-project.org
Subject: [EXTERNAL] [R-sig-ME] Mixed effects model with many zeros

Hi, I've got a set of longitudinal data with negative affect as the dependent variable. Negative affect was measured by 10 items asking about how much of the day the participant felt 10 different negative emotions (ordinal scale from 0-4). The modal response to that survey was 0 for all ten items, resulting in a large number of zero's for that variable along with a strong right skew. I've been exploring CrossValidated and other sources to get a sense of what my options are for modeling this data. I've read about Tweedie models, Tobit (censored) models, hurdle models, beta distribution models, and zero-inflated gamma models. As far as I could understand, the Tweedie model seemed reasonable and I modeled it this way:
neg_nat_mod_tweed <- glmmTMB(negaff ~ enjoynat_c + enjoynat_mean + daynum + (1|MRID),                          data = daily,                         family = tweedie)summary(neg_nat_mod_tweed)

Family: tweedie  ( log )
Formula:          negaff ~ enjoynat_c + enjoynat_mean + daynum + (1 | MRID)Data: daily
     AIC      BIC   logLik deviance df.resid   4637.0   4683.6  -2311.5   4623.0     5753 Random effects:
Conditional model: Groups Name        Variance Std.Dev. MRID   (Intercept) 1.19     1.091   Number of obs: 5760, groups:  MRID, 782 Overdispersion parameter for tweedie family (): 0.248 Conditional model:               Estimate Std. Error z value Pr(>|z|)    (Intercept)   -1.653574   0.075059 -22.030  < 2e-16 ***enjoynat_c    -0.132666   0.033037  -4.016 5.93e-05 ***enjoynat_mean  0.009201   0.134737   0.068    0.946    daynum        -0.087213   0.005578 -15.636  < 2e-16 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I have a few questions. First, does it seem like this is a reasonable way to analyze this data? If not, do you have other recommendations? Second, while the manual for GLMMtmb provides the Tweedie model as an option, here (https://cran.r-project.org/web/packages/glmmTMB/vignettes/glmmTMB.pdf) it says it is not yet implemented. Does anyone know if this model is trustworthy? Lastly, it mentions that the link function is log. I am still learning about how link functions work and I am not sure how to make sense of the coefficients because in their current form the negative intercept makes no sense. Can you offer some guidance on interpretation?
Thank you for your time,Austen
	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models