[R-sig-ME] brms: Multivariate LMEM with systematic missing data
Koen Neijenhuijs
kn.journal.news at gmail.com
Thu Feb 2 11:04:40 CET 2017
Hello all,
I was wondering what would happen with a multivariate LMEM when there is
systematic missing data. I ran a crude simulation to check this situation:
Group A: Has data on Dependent variable 1 (dep1), 2 (dep2), and 3 (dep3)
Group B: Has data on dep1 and dep3
Group C: Has data on dep2 and dep3
All data is measured three times (I am disregarding the effect of time for
simplicity's sake).
The aim is to compare group A on the data it shares with group B and group
C. I created this data and ran a multivariate brms model on it. I did not
provide priors, as such, uninformative priors will be used. Code:
set.seed(103)
# Data structure:
# id | group (A/B/C) | dep1 (mean(SD), A: 10(5), B: 15(5), C: NA) | dep2
(A: 20(5), B: NA, C: 40(5)), dep3 (A: 30(5), B: 40(5), C: 30(5))
n_indiv <- 999
n_per_indiv <- 3
n_tot <- n_indiv * n_per_indiv
id <- gl(n_indiv, n_per_indiv) ## generate identifier columns
dat <- data.frame(id=id,
group=rep(c("A", "B", "C"),each=n_tot/3),
meas=rep(c(1, 2, 3), times=n_tot/n_per_indiv),
dep1=c(rnorm(n_tot/3,10,5), rnorm(n_tot/3,15,5),
rep("NA", times=n_tot/3)),
dep2=c(rnorm(n_tot/3,20,5), rep("NA", times=n_tot/3),
rnorm(n_tot/3,40,5)),
dep3=c(rnorm(n_tot/3,30,5), rnorm(n_tot/3,40,5),
rnorm(n_tot/3,30,5)))
ncores <- detectCores() -1
m1_brms <- brm(cbind(dep1, dep2, dep3) ~ group + (1+meas|id), data=dat,
family="gaussian",
n.warmup=500, n.iter = 1500, n.chains=5, cores=ncores)
sum_m1_brms <- summary(m1_brms)
Everything converged nicely (trace plots are looking good, Rhat is fine I
believe). The output of the summary:
Family: gaussian (identity)
Formula: cbind(dep1, dep2, dep3) ~ group + (1 + meas | id)
Data: dat (Number of observations: 2997)
Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
total post-warmup samples = 5000
WAIC: Not computed
Group-Level Effects:
~id (Number of levels: 999)
Estimate Est.Error l-95% CI u-95% CI
Eff.Sample Rhat
sd(dep1_Intercept) 45.29 37.05 2.16 145.59
402 1.00
sd(dep1_meas) 26.92 20.37 1.30 78.80
226 1.03
sd(dep2_Intercept) 32.91 28.47 0.97 111.39
184 1.03
sd(dep2_meas) 18.89 13.85 0.75 54.06
159 1.02
sd(dep3_Intercept) 0.54 0.50 0.02 2.08
155 1.05
sd(dep3_meas) 0.25 0.23 0.01 0.95
93 1.05
cor(dep1_Intercept,dep1_meas) -0.36 0.57 -0.99 0.89
338 1.02
cor(dep2_Intercept,dep2_meas) -0.40 0.56 -0.98 0.88
241 1.01
cor(dep3_Intercept,dep3_meas) -0.36 0.59 -0.99 0.92
386 1.01
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
dep1_Intercept 1058.94 14.62 1029.65 1087.34 5000 1.00
dep2_Intercept 523.02 8.40 506.45 540.22 2150 1.00
dep3_Intercept 30.25 0.16 29.94 30.56 5000 1.00
dep1_groupB -118.01 20.82 -160.15 -77.53 5000 1.00
dep1_groupC 940.46 20.83 900.16 982.28 5000 1.00
dep2_groupB 1476.62 12.24 1452.73 1500.95 661 1.01
dep2_groupC 953.48 11.75 930.22 976.84 5000 1.00
dep3_groupB 9.52 0.22 9.08 9.95 5000 1.00
dep3_groupC -0.29 0.22 -0.73 0.15 5000 1.00
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma(dep1) 465.54 6.59 452.02 478.22 755 1.00
sigma(dep2) 262.63 3.93 254.64 270.00 712 1.01
sigma(dep3) 4.98 0.07 4.84 5.12 699 1.01
rescor(dep1,dep2) -0.02 0.02 -0.05 0.02 5000 1.00
rescor(dep1,dep3) 0.00 0.02 -0.03 0.04 5000 1.01
rescor(dep2,dep3) -0.02 0.02 -0.06 0.02 5000 1.00
The effects that are in the data, seem to be estimated, but the directions
and magnitudes of the estimates seem all over the place. In the data, group
A was always smaller or equal (in the case of dep3_groupC). As you can see,
the population estimate for dep1_groupB is in the wrong direction, and the
estimate for dep2_groupC is huge. The only estimates that seem non-inflated
and correct are those for dep3, which had data on all three groups.
Would it be safe to assume that systematic missing data on one group,
inflates the estimates between the remaining groups? I would love to hear
your thoughts!
One further question. I tried to perform the same multivariate analysis
using MCMCglmm, but I find the syntax of the random and rcov parameters
hard to set, and the output even harder to interpret. I would love to get
some feedback on how to perform a similar analysis using the MCMCglmm
function. Code:
m1 <- MCMCglmm(cbind(dep1, dep2, dep3) ~ group-1, random = ~us(1+meas):id,
rcov = ~us(trait):units, data = dat, family = c("gaussian",
"gaussian", "gaussian"),
verbose = FALSE)
summary(m1)
Output:
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 119465.7
G-structure: ~us(1 + meas):id
post.mean l-95% CI u-95% CI eff.samp
(Intercept):(Intercept).id 0.5868 0.0010440 2.54542 6.784
meas:(Intercept).id -0.3008 -1.2656335 0.01044 6.621
(Intercept):meas.id -0.3008 -1.2656335 0.01044 6.621
meas:meas.id 0.1716 0.0004239 0.65286 6.936
R-structure: ~us(trait):units
post.mean l-95% CI u-95% CI eff.samp
traitdep1:traitdep1.units 2132866.20 2027876.02 2239057.50 911.5
traitdep2:traitdep1.units 1703387.51 1615965.82 1803474.41 1000.0
traitdep3:traitdep1.units -1405.09 -2700.25 60.61 150.3
traitdep1:traitdep2.units 1703387.51 1615965.82 1803474.41 1000.0
traitdep2:traitdep2.units 2129068.98 2023232.02 2239145.62 1283.7
traitdep3:traitdep2.units -1685.89 -3264.50 -184.62 136.2
traitdep1:traitdep3.units -1405.09 -2700.25 60.61 150.3
traitdep2:traitdep3.units -1685.89 -3264.50 -184.62 136.2
traitdep3:traitdep3.units 26.57 23.99 29.55 188.8
Location effects: cbind(dep1, dep2, dep3) ~ group - 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
groupA 30.69 30.13 31.37 226.0 <0.001 ***
groupB 41.29 39.97 42.75 122.0 <0.001 ***
groupC 31.20 29.89 32.50 151.8 <0.001 ***
Thanks in advance!
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list