[R-sig-ME] brms: Multivariate LMEM with systematic missing data

Thu Feb 2 20:41:51 CET 2017

Could it be that this is partly because you did not include "meas" as a
fixed effect but only as a random effect? Without a corresponding fixed
effect, random effects are centered around zero, which may not be what you
want.

2017-02-02 11:04 GMT+01:00 Koen Neijenhuijs <kn.journal.news at gmail.com>:

> Hello all,
>
> I was wondering what would happen with a multivariate LMEM when there is
> systematic missing data. I ran a crude simulation to check this situation:
>
> Group A: Has data on Dependent variable 1 (dep1), 2 (dep2), and 3 (dep3)
> Group B: Has data on dep1 and dep3
> Group C: Has data on dep2 and dep3
>
> All data is measured three times (I am disregarding the effect of time for
> simplicity's sake).
>
> The aim is to compare group A on the data it shares with group B and group
> C. I created this data and ran a multivariate brms model on it. I did not
> provide priors, as such, uninformative priors will be used. Code:
>
> set.seed(103)
> # Data structure:
> # id | group (A/B/C) | dep1 (mean(SD), A: 10(5), B: 15(5), C: NA) | dep2
> (A: 20(5), B: NA, C: 40(5)), dep3 (A: 30(5), B: 40(5), C: 30(5))
>
> n_indiv <- 999
> n_per_indiv <- 3
> n_tot <- n_indiv * n_per_indiv
> id <- gl(n_indiv, n_per_indiv)  ## generate identifier columns
>
> dat <- data.frame(id=id,
>                   group=rep(c("A", "B", "C"),each=n_tot/3),
>                   meas=rep(c(1, 2, 3), times=n_tot/n_per_indiv),
>                   dep1=c(rnorm(n_tot/3,10,5), rnorm(n_tot/3,15,5),
> rep("NA", times=n_tot/3)),
>                   dep2=c(rnorm(n_tot/3,20,5), rep("NA", times=n_tot/3),
> rnorm(n_tot/3,40,5)),
>                   dep3=c(rnorm(n_tot/3,30,5), rnorm(n_tot/3,40,5),
> rnorm(n_tot/3,30,5)))
>
> ncores <- detectCores() -1
>
> m1_brms <- brm(cbind(dep1, dep2, dep3) ~ group + (1+meas|id), data=dat,
> family="gaussian",
>                n.warmup=500, n.iter = 1500, n.chains=5, cores=ncores)
>
> sum_m1_brms <- summary(m1_brms)
>
> Everything converged nicely (trace plots are looking good, Rhat is fine I
> believe). The output of the summary:
>
>  Family: gaussian (identity)
> Formula: cbind(dep1, dep2, dep3) ~ group + (1 + meas | id)
>    Data: dat (Number of observations: 2997)
> Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
>          total post-warmup samples = 5000
>    WAIC: Not computed
>
> Group-Level Effects:
> ~id (Number of levels: 999)
>                               Estimate Est.Error l-95% CI u-95% CI
> Eff.Sample Rhat
> sd(dep1_Intercept)               45.29     37.05     2.16   145.59
> 402 1.00
> sd(dep1_meas)                    26.92     20.37     1.30    78.80
> 226 1.03
> sd(dep2_Intercept)               32.91     28.47     0.97   111.39
> 184 1.03
> sd(dep2_meas)                    18.89     13.85     0.75    54.06
> 159 1.02
> sd(dep3_Intercept)                0.54      0.50     0.02     2.08
> 155 1.05
> sd(dep3_meas)                     0.25      0.23     0.01     0.95
> 93 1.05
> cor(dep1_Intercept,dep1_meas)    -0.36      0.57    -0.99     0.89
> 338 1.02
> cor(dep2_Intercept,dep2_meas)    -0.40      0.56    -0.98     0.88
> 241 1.01
> cor(dep3_Intercept,dep3_meas)    -0.36      0.59    -0.99     0.92
> 386 1.01
>
> Population-Level Effects:
>                Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
> dep1_Intercept  1058.94     14.62  1029.65  1087.34       5000 1.00
> dep2_Intercept   523.02      8.40   506.45   540.22       2150 1.00
> dep3_Intercept    30.25      0.16    29.94    30.56       5000 1.00
> dep1_groupB     -118.01     20.82  -160.15   -77.53       5000 1.00
> dep1_groupC      940.46     20.83   900.16   982.28       5000 1.00
> dep2_groupB     1476.62     12.24  1452.73  1500.95        661 1.01
> dep2_groupC      953.48     11.75   930.22   976.84       5000 1.00
> dep3_groupB        9.52      0.22     9.08     9.95       5000 1.00
> dep3_groupC       -0.29      0.22    -0.73     0.15       5000 1.00
>
> Family Specific Parameters:
>                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
> sigma(dep1)         465.54      6.59   452.02   478.22        755 1.00
> sigma(dep2)         262.63      3.93   254.64   270.00        712 1.01
> sigma(dep3)           4.98      0.07     4.84     5.12        699 1.01
> rescor(dep1,dep2)    -0.02      0.02    -0.05     0.02       5000 1.00
> rescor(dep1,dep3)     0.00      0.02    -0.03     0.04       5000 1.01
> rescor(dep2,dep3)    -0.02      0.02    -0.06     0.02       5000 1.00
>
> The effects that are in the data, seem to be estimated, but the directions
> and magnitudes of the estimates seem all over the place. In the data, group
> A was always smaller or equal (in the case of dep3_groupC). As you can see,
> the population estimate for dep1_groupB is in the wrong direction, and the
> estimate for dep2_groupC is huge. The only estimates that seem non-inflated
> and correct are those for dep3, which had data on all three groups.
>
> Would it be safe to assume that systematic missing data on one group,
> inflates the estimates between the remaining groups? I would love to hear
> your thoughts!
>
> One further question. I tried to perform the same multivariate analysis
> using MCMCglmm, but I find the syntax of the random and rcov parameters
> hard to set, and the output even harder to interpret. I would love to get
> some feedback on how to perform a similar analysis using the MCMCglmm
> function. Code:
>
> m1 <- MCMCglmm(cbind(dep1, dep2, dep3) ~ group-1, random = ~us(1+meas):id,
>                rcov = ~us(trait):units, data = dat, family = c("gaussian",
> "gaussian", "gaussian"),
>                     verbose = FALSE)
>
> summary(m1)
>
> Output:
>
>  Iterations = 3001:12991
>  Thinning interval  = 10
>  Sample size  = 1000
>
>  DIC: 119465.7
>
>  G-structure:  ~us(1 + meas):id
>
>                            post.mean   l-95% CI u-95% CI eff.samp
> (Intercept):(Intercept).id    0.5868  0.0010440  2.54542    6.784
> meas:(Intercept).id          -0.3008 -1.2656335  0.01044    6.621
> (Intercept):meas.id          -0.3008 -1.2656335  0.01044    6.621
> meas:meas.id                  0.1716  0.0004239  0.65286    6.936
>
>  R-structure:  ~us(trait):units
>
>                            post.mean   l-95% CI   u-95% CI eff.samp
> traitdep1:traitdep1.units 2132866.20 2027876.02 2239057.50    911.5
> traitdep2:traitdep1.units 1703387.51 1615965.82 1803474.41   1000.0
> traitdep3:traitdep1.units   -1405.09   -2700.25      60.61    150.3
> traitdep1:traitdep2.units 1703387.51 1615965.82 1803474.41   1000.0
> traitdep2:traitdep2.units 2129068.98 2023232.02 2239145.62   1283.7
> traitdep3:traitdep2.units   -1685.89   -3264.50    -184.62    136.2
> traitdep1:traitdep3.units   -1405.09   -2700.25      60.61    150.3
> traitdep2:traitdep3.units   -1685.89   -3264.50    -184.62    136.2
> traitdep3:traitdep3.units      26.57      23.99      29.55    188.8
>
>  Location effects: cbind(dep1, dep2, dep3) ~ group - 1
>
>        post.mean l-95% CI u-95% CI eff.samp  pMCMC
> groupA     30.69    30.13    31.37    226.0 <0.001 ***
> groupB     41.29    39.97    42.75    122.0 <0.001 ***
> groupC     31.20    29.89    32.50    151.8 <0.001 ***
>
> Thanks in advance!
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]