[R-sig-ME] Model specification/family for a continuous/proportional response with many zeros
Michael Lawson
Mon May 17 15:45:16 CEST 2021
Hi Thierry,
Thank you for your advice and speedy response.
Most of the data is closer to the lower bound (0). e.g. the mean time for
group A in zone A = 15.1 seconds and group A in zone B = 3.8 seconds.
However there are a very small number of outliers near the upper bound, the
largest being 294 out of the 300 seconds (see the attached file if you want
to look at the data).
I have taken a stab at running a Zero-inflated Beta GLMM using glmmTMB in R
like so:
betta_mod <- glmmTMB(prop_time ~ group*zone + (1|id),
family = beta_family(),
data = glmm_zone_data)
*Family: beta ( logit )*
*Formula: prop_time ~ group * zone + (1 | id)Zero inflation:
~1Data: glmm_zone_data AIC BIC logLik deviance df.resid
-763.6 -736.3 388.8 -777.6 359Random effects:Conditional
model: Groups Name Variance Std.Dev. id (Intercept) 2.386e-09
4.885e-05Number of obs: 366, groups: id, 14Overdispersion parameter for
beta family (): 13.1Conditional model: Estimate Std. Error
z value Pr(>|z|) (Intercept) -2.7685 0.1031 -26.844 < 2e-16
***groupB -0.4455 0.1498 -2.975 0.002932 **zonezone_B
-0.4179 0.1524 -2.741 0.006124 **groupB:zonezone_B 0.8443
0.2190 3.855 0.000116 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1Zero-inflation model: Estimate Std. Error z
value Pr(>|z|) (Intercept) -1.1804 0.1233 -9.575 <2e-16
***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1*
Does this look like the correct way of specifying the model? I am a little
confused about specifying and interpreting the zero-inflation component - I
have only just begun reading about this.
I noticed that the dispersion parameter is quite high at 13.1. I'm not sure
if this matters for beta models?. I tried running DHARMa simulateResiduals
on the model output and got significant deviations in the dispersion
(<2.2e-16) and KS tests. e.g.
*DHARMa nonparametric dispersion test via sd of residuals fitted vs.
*data: simulationOutput*
*ratioObsSim = 1.3612, p-value < 2.2e-16*
*alternative hypothesis: two.sided*
Many thanks,
On Mon, 17 May 2021 at 13:22, Thierry Onkelinx <thierry.onkelinx using inbo.be>
> Dear Michael,
> Your data has bounds (lower bound at 0 and upper bound at 300) and you
> have a lot of data close to a boundary. In such a case, a continuous
> distribution which ignores those bound is not a good idea. If the time
> spent outside of both zones is limited, then a long time in zone A excludes
> a long time in zone B by definition. Then I'd look towards a multinomial
> distribution. If the time spent outside both zones is dominant, then you
> can use a zero-inflated beta as you suggested. A zero-inflated gamma might
> be OK if the data is not too close to the upper boundary. If you are
> considering zero-inflated beta vs zero-inflated gamma, then you should
> choose zero-inflated beta IMHO.
> Best regards,
> Op ma 17 mei 2021 om 13:52 schreef Michael Lawson via R-sig-mixed-models <
> r-sig-mixed-models using r-project.org>:
>> Hello,
>> I am new to GLMMs and have a dataset where I have two distinct groups (A
>> and B) of 7 individuals each. The data consists of repeated measurements
>> of
>> each individual where the amount of time spent at either zone_A or zone_B
>> is recorded (out of a total time of 300s/observation period). For most of
>> the time period the individuals are in neither zone.
>> I want to test if group A and group B spend more time in zone A compared
>> to
>> zone B (and vice versa).
>> Speaking to someone else, they said I should use a Binomial GLMM using
>> cbind. i.e.
>> cbind(time_at_zone_A, time_at_zone_B) ~ group + (1| id).
>> However, the response variable is continuous (albeit with an upper bound
>> of
>> 300 seconds per observation period), so I'm not sure if this is
>> appropriate?
>> Should I convert the response into a proportion and use something like a
>> Beta GLMM or else use a continuous (Gamma) GLMM? e.g. something like:
>> prop_time ~ zone*group + (1|id)
>> The data is quite heavily right-skewed and contains a lot of 0's, so
>> reading around it also looks like I may need to convert these into a
>> zero-inflated/hurdle model?
>> Thank you for any suggestions,
>> Mike
