[R-sig-ME] Model specification/family for a continuous/proportional response with many zeros

Mon May 17 14:22:41 CEST 2021

Dear Michael,

Your data has bounds (lower bound at 0 and upper bound at 300) and you have
a lot of data close to a boundary. In such a case, a continuous
distribution which ignores those bound is not a good idea. If the time
spent outside of both zones is limited, then a long time in zone A excludes
a long time in zone B by definition. Then I'd look towards a multinomial
distribution. If the time spent outside both zones is dominant, then you
can use a zero-inflated beta as you suggested. A zero-inflated gamma might
be OK if the data is not too close to the upper boundary. If you are
considering zero-inflated beta vs zero-inflated gamma, then you should
choose zero-inflated beta IMHO.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Op ma 17 mei 2021 om 13:52 schreef Michael Lawson via R-sig-mixed-models <
r-sig-mixed-models using r-project.org>:

> Hello,
>
> I am new to GLMMs and have a dataset where I have two distinct groups (A
> and B) of 7 individuals each. The data consists of repeated measurements of
> each individual where the amount of time spent at either zone_A or zone_B
> is recorded (out of a total time of 300s/observation period). For most of
> the time period the individuals are in neither zone.
>
> I want to test if group A and group B spend more time in zone A compared to
> zone B (and vice versa).
>
> Speaking to someone else, they said I should use a Binomial GLMM using
> cbind. i.e.
> cbind(time_at_zone_A, time_at_zone_B) ~ group + (1| id).
>
> However, the response variable is continuous (albeit with an upper bound of
> 300 seconds per observation period), so I'm not sure if this is
> appropriate?
>
> Should I convert the response into a proportion and use something like a
> Beta GLMM or else use a continuous (Gamma) GLMM? e.g. something like:
> prop_time ~ zone*group + (1|id)
>
> The data is quite heavily right-skewed and contains a lot of 0's, so
> reading around it also looks like I may need to convert these into a
> zero-inflated/hurdle model?
>
> Thank you for any suggestions,
> Mike
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]