[R-sig-ME] brms: Multivariate LMEM with systematic missing data

Thu Feb 2 11:04:40 CET 2017

Hello all,

I was wondering what would happen with a multivariate LMEM when there is
systematic missing data. I ran a crude simulation to check this situation:

Group A: Has data on Dependent variable 1 (dep1), 2 (dep2), and 3 (dep3)
Group B: Has data on dep1 and dep3
Group C: Has data on dep2 and dep3

All data is measured three times (I am disregarding the effect of time for
simplicity's sake).

The aim is to compare group A on the data it shares with group B and group
C. I created this data and ran a multivariate brms model on it. I did not
provide priors, as such, uninformative priors will be used. Code:

set.seed(103)
# Data structure:
# id | group (A/B/C) | dep1 (mean(SD), A: 10(5), B: 15(5), C: NA) | dep2
(A: 20(5), B: NA, C: 40(5)), dep3 (A: 30(5), B: 40(5), C: 30(5))

n_indiv <- 999
n_per_indiv <- 3
n_tot <- n_indiv * n_per_indiv
id <- gl(n_indiv, n_per_indiv)  ## generate identifier columns

dat <- data.frame(id=id,
                  group=rep(c("A", "B", "C"),each=n_tot/3),
                  meas=rep(c(1, 2, 3), times=n_tot/n_per_indiv),
                  dep1=c(rnorm(n_tot/3,10,5), rnorm(n_tot/3,15,5),
rep("NA", times=n_tot/3)),
                  dep2=c(rnorm(n_tot/3,20,5), rep("NA", times=n_tot/3),
rnorm(n_tot/3,40,5)),
                  dep3=c(rnorm(n_tot/3,30,5), rnorm(n_tot/3,40,5),
rnorm(n_tot/3,30,5)))

ncores <- detectCores() -1

m1_brms <- brm(cbind(dep1, dep2, dep3) ~ group + (1+meas|id), data=dat,
family="gaussian",
               n.warmup=500, n.iter = 1500, n.chains=5, cores=ncores)

sum_m1_brms <- summary(m1_brms)

Everything converged nicely (trace plots are looking good, Rhat is fine I
believe). The output of the summary:

 Family: gaussian (identity)
Formula: cbind(dep1, dep2, dep3) ~ group + (1 + meas | id)
   Data: dat (Number of observations: 2997)
Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
         total post-warmup samples = 5000
   WAIC: Not computed

Group-Level Effects:
~id (Number of levels: 999)
                              Estimate Est.Error l-95% CI u-95% CI
Eff.Sample Rhat
sd(dep1_Intercept)               45.29     37.05     2.16   145.59
402 1.00
sd(dep1_meas)                    26.92     20.37     1.30    78.80
226 1.03
sd(dep2_Intercept)               32.91     28.47     0.97   111.39
184 1.03
sd(dep2_meas)                    18.89     13.85     0.75    54.06
159 1.02
sd(dep3_Intercept)                0.54      0.50     0.02     2.08
155 1.05
sd(dep3_meas)                     0.25      0.23     0.01     0.95
93 1.05
cor(dep1_Intercept,dep1_meas)    -0.36      0.57    -0.99     0.89
338 1.02
cor(dep2_Intercept,dep2_meas)    -0.40      0.56    -0.98     0.88
241 1.01
cor(dep3_Intercept,dep3_meas)    -0.36      0.59    -0.99     0.92
386 1.01

Population-Level Effects:
               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
dep1_Intercept  1058.94     14.62  1029.65  1087.34       5000 1.00
dep2_Intercept   523.02      8.40   506.45   540.22       2150 1.00
dep3_Intercept    30.25      0.16    29.94    30.56       5000 1.00
dep1_groupB     -118.01     20.82  -160.15   -77.53       5000 1.00
dep1_groupC      940.46     20.83   900.16   982.28       5000 1.00
dep2_groupB     1476.62     12.24  1452.73  1500.95        661 1.01
dep2_groupC      953.48     11.75   930.22   976.84       5000 1.00
dep3_groupB        9.52      0.22     9.08     9.95       5000 1.00
dep3_groupC       -0.29      0.22    -0.73     0.15       5000 1.00

Family Specific Parameters:
                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma(dep1)         465.54      6.59   452.02   478.22        755 1.00
sigma(dep2)         262.63      3.93   254.64   270.00        712 1.01
sigma(dep3)           4.98      0.07     4.84     5.12        699 1.01
rescor(dep1,dep2)    -0.02      0.02    -0.05     0.02       5000 1.00
rescor(dep1,dep3)     0.00      0.02    -0.03     0.04       5000 1.01
rescor(dep2,dep3)    -0.02      0.02    -0.06     0.02       5000 1.00

The effects that are in the data, seem to be estimated, but the directions
and magnitudes of the estimates seem all over the place. In the data, group
A was always smaller or equal (in the case of dep3_groupC). As you can see,
the population estimate for dep1_groupB is in the wrong direction, and the
estimate for dep2_groupC is huge. The only estimates that seem non-inflated
and correct are those for dep3, which had data on all three groups.

Would it be safe to assume that systematic missing data on one group,
inflates the estimates between the remaining groups? I would love to hear
your thoughts!

One further question. I tried to perform the same multivariate analysis
using MCMCglmm, but I find the syntax of the random and rcov parameters
hard to set, and the output even harder to interpret. I would love to get
some feedback on how to perform a similar analysis using the MCMCglmm
function. Code:

m1 <- MCMCglmm(cbind(dep1, dep2, dep3) ~ group-1, random = ~us(1+meas):id,
               rcov = ~us(trait):units, data = dat, family = c("gaussian",
"gaussian", "gaussian"),
                    verbose = FALSE)

summary(m1)

Output:

 Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000

 DIC: 119465.7

 G-structure:  ~us(1 + meas):id

                           post.mean   l-95% CI u-95% CI eff.samp
(Intercept):(Intercept).id    0.5868  0.0010440  2.54542    6.784
meas:(Intercept).id          -0.3008 -1.2656335  0.01044    6.621
(Intercept):meas.id          -0.3008 -1.2656335  0.01044    6.621
meas:meas.id                  0.1716  0.0004239  0.65286    6.936

 R-structure:  ~us(trait):units

                           post.mean   l-95% CI   u-95% CI eff.samp
traitdep1:traitdep1.units 2132866.20 2027876.02 2239057.50    911.5
traitdep2:traitdep1.units 1703387.51 1615965.82 1803474.41   1000.0
traitdep3:traitdep1.units   -1405.09   -2700.25      60.61    150.3
traitdep1:traitdep2.units 1703387.51 1615965.82 1803474.41   1000.0
traitdep2:traitdep2.units 2129068.98 2023232.02 2239145.62   1283.7
traitdep3:traitdep2.units   -1685.89   -3264.50    -184.62    136.2
traitdep1:traitdep3.units   -1405.09   -2700.25      60.61    150.3
traitdep2:traitdep3.units   -1685.89   -3264.50    -184.62    136.2
traitdep3:traitdep3.units      26.57      23.99      29.55    188.8

 Location effects: cbind(dep1, dep2, dep3) ~ group - 1

       post.mean l-95% CI u-95% CI eff.samp  pMCMC
groupA     30.69    30.13    31.37    226.0 <0.001 ***
groupB     41.29    39.97    42.75    122.0 <0.001 ***
groupC     31.20    29.89    32.50    151.8 <0.001 ***

Thanks in advance!

	[[alternative HTML version deleted]]