[R-sig-ME] r-sig-mixed-models at r-project.org: multiple dependent variables in lmer()

Thu Oct 19 21:03:35 CEST 2017

Hi Dot,

Restructure your data into a 'very long' format, that is:

ID gp time ilr   y
1  I  1    ilr1  .
1  I  2    ilr2  .
1  I  3    ilr3  .
1  I  1    ilr1  .
1  I  2    ilr2  .
1  I  3    ilr3  .
1  I  1    ilr1  .
1  I  2    ilr2  .
1  I  3    ilr3  .
...

where 'y' is the actual ilr value for that person in that group at that time point for that domain. Then you can fit a multivariate model to these data. For example, a MANOVA-type model would be:

library(nlme)
dat$cond <- 1:9
res <- gls(y ~ factor(gp)*factor(time)*factor(ilr), correlation = corSymm(form = ~ 1 | id), weights = varIdent(form = ~ 1 | cond), data=dat)
anova(res)

But this will estimate 9 variances and 36 covariances (plus the fixed effects), which might be pushing things. So you might want to consider a more parsimonious model. The other extreme would be:

lmer(y ~ factor(gp)*factor(time)*factor(ilr) + (1 | ID), data=dat)

or equivalently

lme(y ~ factor(gp)*factor(time)*factor(ilr), random = ~ 1 | ID, data=dat)

but this probably way too parsimonious. 

Best,
Wolfgang

-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Dumuid, Dorothea - tridy002
Sent: Saturday, 14 October, 2017 10:06
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] r-sig-mixed-models at r-project.org: multiple dependent variables in lmer()

We are analysing data from a randomised controlled trial for an exercise intervention.

We have 106 participants, in three groups:
(1) control (n=34)
(2) moderate exercise (n=36)
(2) intensive exercise (n= 36)

We want to know if participants' use of time changed differently depending on which group they were in.

Our outcome measure is participants' 24-hour time-use composition (minutes/day spent in 4 domains: sleep, sitting, standing and physical activity).

Time use is measured at 3 time points:
(1) baseline
(2) post-intervention
(3) 12-month follow-up

Time in all four domains always adds to 24 hours, therefore if all components are included in the model there would be perfect multicollinearity. So we have expressed the time-use compositions as sets of three isometric log-ratio (ilr) coordinates created using an orthonormal basis. These ilr coordinates contain all relative information regarding the time-use compositions and can be used to represent the compositions in multivariate statistical models.

So, the variables for our model look like this:
ID = participant ID
gp = a factor variable ("I", "M, "C"), for intense, moderate or control group
time = a factor variable (1, 2, 3) for time point of measurement
ilr1, ilr2 and ilr3 = three isometric log ratios (the dependent variables).

We would like to run a model like this:

fit=lmer(cbind(ilr1, ilr2, ilr3) ~gp * time + (1|ID)),
car::Anova(fit)  # this does a Type II MANOVA Test (Pillai)

(ignoring for the moment that participants may have random slopes).

But the lme4 regression command (lmer) does not allow more than one dependent variable. It's possible to run a separate lmer() for each log-ratio coordinate, and then predict a new log-ratio coordinate for each time point, for each group. Because the log-ratio coordinates are orthornormal, we can simply put the predicted log-ratios back together, find their inverse, and then we can compare the predicted time-use composition for each group, for each time point. So we can see from this how time in the four domains is predicted to change over the time points, for both groups.

However, we cannot work out how to compute a statistic for the interaction effect between group and time point for all the log ratios together (i.e., the set of log ratios). Is it possible to run a MANOVA of the complete set of models?

Thanks in advance!
Dot