[R-sig-ME] Mixed model compositional data

Tue Mar 26 14:46:53 CET 2019

Hi Luca,

as far as I know the poisson regression you got is the best way to deal
with this. In the end, your raw-data is not compositional (i.e. summing up
to 100%) but it is count data (summing up to N-AVAILABLE), which you COULD
transform to compositional (i.e. to %; introducing a linear combination
between the different response types, which would then require a Dirichlet
regression). So in the end, of course there is a basic
methodological/theoretical implication, such that observing 7 in 10 means
something different then 7 in 20 ... and that is what offset is exactly
for. What I think the model - does not- account for is, that having 100
available observations also provides more precision in the count estimate,
such that you would trust 70 out of 100 more than 7 out of 10 when
"stating" this is 70%. I guess, one could therefore think about adjusting
the "weights" option in the model, vial 'available' too, to alter the
dispersion parameter in the poisson distribution, but I think this is
rather uncommon (at least the GLM help file says so... :)).

But if you feel uncomfortable with this, then you could switch to
Hierarchical Bayesian models and the package 'brms'
For this you would use the second version of your suggested data frame, but
calculate proportions of DV=response/available (thus make it compositional)
beforehand; then the basic model code would be:
brm(DV~Behavior+(1|Factor1)+(1|Factor2), data=...,
family=dirichlet(link="logit"), etc...)
This model code pretty much reads like a mixed model (i.e. random variance
for Factor1 intercepts will be estimated via likelihood), you just have to
figure out how to define the priors :), and which question you want to test
(e.g. parameter estimation, which Bayes Factors etc).

This feature in brm seems to be pretty new (2 days old), and I am not 100%
sure how to deal with this interaction. So I guess, if you want to follow
this option, you might want to post your question in this stan forum:
https://discourse.mc-stan.org

Hope this helps.

Best, René

Am Di., 26. März 2019 um 13:34 Uhr schrieb Boku mail <
luca.corlatti using boku.ac.at>:

> Dear list members,
> Suppose I have a dataset like this:
>
> FACTOR1 FACTOR2 TREATMENT AVAILABLE RESTING WALKING FEEDING DRINKING
> 1 1 A 66 40 12 6 8
> 1 1 B 72 28 22 8 14
> 1 1 C 50 20 10 4 16
> 2 1 A 63 30 12 8 13
> 2 1 B 59 13 27 4 15
> 2 1 C 50 18 20 3 9
> 1 2 A 70 40 20 5 5
> 1 2 B 80 45 15 5 15
> 1 2 C 60 15 15 20 10
> 2 2 A 58 28 20 4 6
> 2 2 B 75 18 22 10 15
> 2 2 C 51 19 21 5 6
> …
>
> Where I have behavioral data (“RESTING”, “WALKING”, “FEEDING”, “DRINKING”)
> collected using scan sampling (i.e., each row corresponds to one scan,
> where a number “AVAILABLE” of animals have been observed, and the behavior
> of each animal has been categorized into “resting”, “walking” “feeding” or
> “drinking”). Additionally, observations are grouped in 2 factors (“FACTOR1”
> and “FACTOR2”) to be considered as random. I am interested in knowing the
> effect of the variable “TREATMENT” on each behavioral category.
>
> The main problem is that the behavioral variables represent compositional
> data (i.e. they sum up to the “AVAILABLE” number of animals; to put it in
> other words, the proportions of animals within each behavioral category
> sums up to 1). The packages dealing with compositional data do not appear
> to handle random factors, thus I am wondering what would be the most
> sensible way to deal with this kind of data.
>
> Would this workaround make sense?
> First, reorganize the multivariate data with an extra lowest level
> indicating the responses:
>
> FACTOR1 FACTOR2 TREATMENT AVAILABLE BEHAVIOR RESPONSE
> 1 1 A 66 RESTING 40
> 1 1 B 72 RESTING 28
> 1 1 C 50 RESTING 20
> 2 1 A 63 RESTING 30
> 2 1 B 59 RESTING 13
> 2 1 C 50 RESTING 20
> 1 2 A 70 RESTING 18
> 1 2 B 80 RESTING 45
> 1 2 C 60 RESTING 15
> 2 2 A 58 RESTING 28
> 2 2 B 75 RESTING 18
> 2 2 C 51 RESTING 19
> …
>
> Then, fit a model like:
>
> Mod <- glmmTMB(RESPONSE ~ TREATMENT * BEHAVIOR + (1|FACTOR1) + (1|FACTOR2)
> + offset(log(AVAILABLE), family=Poisson, data=my.data)
>
> I am not sure this workaround would deal efficiently with the
> compositional nature of the data.
>
> Thanks a lot in advance!
> Best,
> Luca
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]