[R-sig-ME] Mixed model compositional data

Tue Mar 26 13:33:59 CET 2019

Dear list members,
Suppose I have a dataset like this:

FACTOR1 FACTOR2 TREATMENT AVAILABLE RESTING WALKING FEEDING DRINKING
1     1     A      66     40    12    6     8
1     1     B      72     28    22    8     14
1     1     C      50     20    10    4     16
2     1     A      63     30    12    8     13
2     1     B      59     13    27    4     15
2     1     C      50     18    20    3     9
1     2     A      70     40    20    5     5
1     2     B      80     45    15    5     15
1     2     C      60     15    15    20     10
2     2     A      58     28    20    4     6
2     2     B      75     18    22    10     15
2     2     C      51     19    21    5      6
…

Where I have behavioral data (“RESTING”, “WALKING”, “FEEDING”, “DRINKING”) collected using scan sampling (i.e., each row corresponds to one scan, where a number “AVAILABLE” of animals have been observed, and the behavior of each animal has been categorized into “resting”, “walking” “feeding” or “drinking”). Additionally, observations are grouped in 2 factors (“FACTOR1” and “FACTOR2”) to be considered as random. I am interested in knowing the effect of the variable “TREATMENT” on each behavioral category.

The main problem is that the behavioral variables represent compositional data (i.e. they sum up to the “AVAILABLE” number of animals; to put it in other words, the proportions of animals within each behavioral category sums up to 1). The packages dealing with compositional data do not appear to handle random factors, thus I am wondering what would be the most sensible way to deal with this kind of data.

Would this workaround make sense?
First, reorganize the multivariate data with an extra lowest level indicating the responses:

FACTOR1 FACTOR2 TREATMENT AVAILABLE BEHAVIOR  RESPONSE
1     1     A      66     RESTING     40
1     1     B      72     RESTING     28
1     1     C      50     RESTING     20
2     1     A      63     RESTING     30
2     1     B      59     RESTING     13
2     1     C      50     RESTING     20
1     2     A      70     RESTING     18
1     2     B      80     RESTING     45
1     2     C      60     RESTING     15
2     2     A      58     RESTING     28
2     2     B      75     RESTING     18
2     2     C      51     RESTING     19
…

Then, fit a model like:

Mod <- glmmTMB(RESPONSE ~ TREATMENT * BEHAVIOR + (1|FACTOR1) + (1|FACTOR2) + offset(log(AVAILABLE), family=Poisson, data=my.data)

I am not sure this workaround would deal efficiently with the compositional nature of the data.

Thanks a lot in advance!
Best,
Luca

	[[alternative HTML version deleted]]