[R-sig-ME] brms: Multivariate LMEM with systematic missing data
Koen Neijenhuijs
kn.journal.news at gmail.com
Fri Feb 3 11:39:24 CET 2017
Hi Paul and Jarrod,
@Paul: not including "meas" as a fixed effect was an error on my part. I
reran the model with it included (note, meas had no effect size in the
generated data). The output still shows the same kind of inflated
population estimates that I noted before:
Family: gaussian (identity)
Formula: cbind(dep1, dep2, dep3) ~ group * meas + (1 + meas | id)
Data: dat (Number of observations: 2997)
Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
total post-warmup samples = 5000
WAIC: Not computed
Group-Level Effects:
~id (Number of levels: 999)
Estimate Est.Error l-95% CI u-95% CI
Eff.Sample Rhat
sd(dep1_Intercept) 41.25 32.88 1.49 125.28
722 1.01
sd(dep1_meas) 23.84 17.80 0.95 68.74
195 1.02
sd(dep2_Intercept) 28.35 21.61 0.81 81.16
643 1.00
sd(dep2_meas) 15.10 10.70 0.78 40.85
280 1.01
sd(dep3_Intercept) 0.45 0.38 0.02 1.45
652 1.00
sd(dep3_meas) 0.21 0.17 0.01 0.67
386 1.00
cor(dep1_Intercept,dep1_meas) -0.31 0.56 -0.98 0.89
691 1.01
cor(dep2_Intercept,dep2_meas) -0.31 0.56 -0.98 0.87
583 1.00
cor(dep3_Intercept,dep3_meas) -0.33 0.58 -0.99 0.90
905 1.00
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
dep1_Intercept 1165.57 38.58 1090.30 1243.48 1627 1.00
dep2_Intercept 528.38 22.42 485.62 572.92 1613 1.00
dep3_Intercept 30.07 0.43 29.25 30.91 1145 1.00
dep1_groupB -185.97 54.52 -292.98 -78.96 1703 1.00
dep1_groupC 833.20 54.37 726.62 942.00 1766 1.00
dep1_meas -53.49 17.92 -88.61 -17.70 1512 1.00
dep1_groupB:meas 33.89 25.41 -16.60 83.19 1727 1.00
dep1_groupC:meas 53.53 25.13 3.72 102.48 1672 1.00
dep2_groupB 1471.02 31.32 1409.44 1530.53 1711 1.00
dep2_groupC 945.18 31.53 882.38 1006.88 1704 1.00
dep2_meas -2.86 10.40 -23.08 16.94 1533 1.00
dep2_groupB:meas 2.63 14.58 -24.96 31.52 1551 1.00
dep2_groupC:meas 4.24 14.56 -24.24 32.84 1650 1.00
dep3_groupB 9.49 0.60 8.33 10.69 1466 1.00
dep3_groupC 0.04 0.60 -1.14 1.21 1398 1.01
dep3_meas 0.09 0.20 -0.30 0.47 1056 1.00
dep3_groupB:meas 0.02 0.28 -0.52 0.55 1366 1.00
dep3_groupC:meas -0.16 0.28 -0.72 0.37 1107 1.01
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma(dep1) 465.19 6.69 452.13 478.19 5000 1
sigma(dep2) 263.12 3.76 255.76 270.45 5000 1
sigma(dep3) 4.98 0.07 4.85 5.12 5000 1
rescor(dep1,dep2) -0.02 0.02 -0.06 0.02 5000 1
rescor(dep1,dep3) 0.01 0.02 -0.03 0.04 5000 1
rescor(dep2,dep3) -0.02 0.02 -0.06 0.02 5000 1
@Jarrod: Thanks for the translation, I will delve into it!
I believe it is safe to say that systematic missing values lead to inflated
population estimates. I reran the same type of model about 3 times (with
different seeds), and the same kind of weird things popped up. Of course, I
must confess this is a very crude simulation (my data-generation is far
from sophisticated).
Kind regards,
Koen
2017-02-02 20:41 GMT+01:00 Paul Buerkner <paul.buerkner at gmail.com>:
> Could it be that this is partly because you did not include "meas" as a
> fixed effect but only as a random effect? Without a corresponding fixed
> effect, random effects are centered around zero, which may not be what you
> want.
>
> 2017-02-02 11:04 GMT+01:00 Koen Neijenhuijs <kn.journal.news at gmail.com>:
>
>> Hello all,
>>
>> I was wondering what would happen with a multivariate LMEM when there is
>> systematic missing data. I ran a crude simulation to check this situation:
>>
>> Group A: Has data on Dependent variable 1 (dep1), 2 (dep2), and 3 (dep3)
>> Group B: Has data on dep1 and dep3
>> Group C: Has data on dep2 and dep3
>>
>> All data is measured three times (I am disregarding the effect of time for
>> simplicity's sake).
>>
>> The aim is to compare group A on the data it shares with group B and group
>> C. I created this data and ran a multivariate brms model on it. I did not
>> provide priors, as such, uninformative priors will be used. Code:
>>
>> set.seed(103)
>> # Data structure:
>> # id | group (A/B/C) | dep1 (mean(SD), A: 10(5), B: 15(5), C: NA) | dep2
>> (A: 20(5), B: NA, C: 40(5)), dep3 (A: 30(5), B: 40(5), C: 30(5))
>>
>> n_indiv <- 999
>> n_per_indiv <- 3
>> n_tot <- n_indiv * n_per_indiv
>> id <- gl(n_indiv, n_per_indiv) ## generate identifier columns
>>
>> dat <- data.frame(id=id,
>> group=rep(c("A", "B", "C"),each=n_tot/3),
>> meas=rep(c(1, 2, 3), times=n_tot/n_per_indiv),
>> dep1=c(rnorm(n_tot/3,10,5), rnorm(n_tot/3,15,5),
>> rep("NA", times=n_tot/3)),
>> dep2=c(rnorm(n_tot/3,20,5), rep("NA", times=n_tot/3),
>> rnorm(n_tot/3,40,5)),
>> dep3=c(rnorm(n_tot/3,30,5), rnorm(n_tot/3,40,5),
>> rnorm(n_tot/3,30,5)))
>>
>> ncores <- detectCores() -1
>>
>> m1_brms <- brm(cbind(dep1, dep2, dep3) ~ group + (1+meas|id), data=dat,
>> family="gaussian",
>> n.warmup=500, n.iter = 1500, n.chains=5, cores=ncores)
>>
>> sum_m1_brms <- summary(m1_brms)
>>
>> Everything converged nicely (trace plots are looking good, Rhat is fine I
>> believe). The output of the summary:
>>
>> Family: gaussian (identity)
>> Formula: cbind(dep1, dep2, dep3) ~ group + (1 + meas | id)
>> Data: dat (Number of observations: 2997)
>> Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
>> total post-warmup samples = 5000
>> WAIC: Not computed
>>
>> Group-Level Effects:
>> ~id (Number of levels: 999)
>> Estimate Est.Error l-95% CI u-95% CI
>> Eff.Sample Rhat
>> sd(dep1_Intercept) 45.29 37.05 2.16 145.59
>> 402 1.00
>> sd(dep1_meas) 26.92 20.37 1.30 78.80
>> 226 1.03
>> sd(dep2_Intercept) 32.91 28.47 0.97 111.39
>> 184 1.03
>> sd(dep2_meas) 18.89 13.85 0.75 54.06
>> 159 1.02
>> sd(dep3_Intercept) 0.54 0.50 0.02 2.08
>> 155 1.05
>> sd(dep3_meas) 0.25 0.23 0.01 0.95
>> 93 1.05
>> cor(dep1_Intercept,dep1_meas) -0.36 0.57 -0.99 0.89
>> 338 1.02
>> cor(dep2_Intercept,dep2_meas) -0.40 0.56 -0.98 0.88
>> 241 1.01
>> cor(dep3_Intercept,dep3_meas) -0.36 0.59 -0.99 0.92
>> 386 1.01
>>
>> Population-Level Effects:
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> dep1_Intercept 1058.94 14.62 1029.65 1087.34 5000 1.00
>> dep2_Intercept 523.02 8.40 506.45 540.22 2150 1.00
>> dep3_Intercept 30.25 0.16 29.94 30.56 5000 1.00
>> dep1_groupB -118.01 20.82 -160.15 -77.53 5000 1.00
>> dep1_groupC 940.46 20.83 900.16 982.28 5000 1.00
>> dep2_groupB 1476.62 12.24 1452.73 1500.95 661 1.01
>> dep2_groupC 953.48 11.75 930.22 976.84 5000 1.00
>> dep3_groupB 9.52 0.22 9.08 9.95 5000 1.00
>> dep3_groupC -0.29 0.22 -0.73 0.15 5000 1.00
>>
>> Family Specific Parameters:
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> sigma(dep1) 465.54 6.59 452.02 478.22 755 1.00
>> sigma(dep2) 262.63 3.93 254.64 270.00 712 1.01
>> sigma(dep3) 4.98 0.07 4.84 5.12 699 1.01
>> rescor(dep1,dep2) -0.02 0.02 -0.05 0.02 5000 1.00
>> rescor(dep1,dep3) 0.00 0.02 -0.03 0.04 5000 1.01
>> rescor(dep2,dep3) -0.02 0.02 -0.06 0.02 5000 1.00
>>
>> The effects that are in the data, seem to be estimated, but the directions
>> and magnitudes of the estimates seem all over the place. In the data,
>> group
>> A was always smaller or equal (in the case of dep3_groupC). As you can
>> see,
>> the population estimate for dep1_groupB is in the wrong direction, and the
>> estimate for dep2_groupC is huge. The only estimates that seem
>> non-inflated
>> and correct are those for dep3, which had data on all three groups.
>>
>> Would it be safe to assume that systematic missing data on one group,
>> inflates the estimates between the remaining groups? I would love to hear
>> your thoughts!
>>
>> One further question. I tried to perform the same multivariate analysis
>> using MCMCglmm, but I find the syntax of the random and rcov parameters
>> hard to set, and the output even harder to interpret. I would love to get
>> some feedback on how to perform a similar analysis using the MCMCglmm
>> function. Code:
>>
>> m1 <- MCMCglmm(cbind(dep1, dep2, dep3) ~ group-1, random = ~us(1+meas):id,
>> rcov = ~us(trait):units, data = dat, family = c("gaussian",
>> "gaussian", "gaussian"),
>> verbose = FALSE)
>>
>> summary(m1)
>>
>> Output:
>>
>> Iterations = 3001:12991
>> Thinning interval = 10
>> Sample size = 1000
>>
>> DIC: 119465.7
>>
>> G-structure: ~us(1 + meas):id
>>
>> post.mean l-95% CI u-95% CI eff.samp
>> (Intercept):(Intercept).id 0.5868 0.0010440 2.54542 6.784
>> meas:(Intercept).id -0.3008 -1.2656335 0.01044 6.621
>> (Intercept):meas.id -0.3008 -1.2656335 0.01044 6.621
>> meas:meas.id 0.1716 0.0004239 0.65286 6.936
>>
>> R-structure: ~us(trait):units
>>
>> post.mean l-95% CI u-95% CI eff.samp
>> traitdep1:traitdep1.units 2132866.20 2027876.02 2239057.50 911.5
>> traitdep2:traitdep1.units 1703387.51 1615965.82 1803474.41 1000.0
>> traitdep3:traitdep1.units -1405.09 -2700.25 60.61 150.3
>> traitdep1:traitdep2.units 1703387.51 1615965.82 1803474.41 1000.0
>> traitdep2:traitdep2.units 2129068.98 2023232.02 2239145.62 1283.7
>> traitdep3:traitdep2.units -1685.89 -3264.50 -184.62 136.2
>> traitdep1:traitdep3.units -1405.09 -2700.25 60.61 150.3
>> traitdep2:traitdep3.units -1685.89 -3264.50 -184.62 136.2
>> traitdep3:traitdep3.units 26.57 23.99 29.55 188.8
>>
>> Location effects: cbind(dep1, dep2, dep3) ~ group - 1
>>
>> post.mean l-95% CI u-95% CI eff.samp pMCMC
>> groupA 30.69 30.13 31.37 226.0 <0.001 ***
>> groupB 41.29 39.97 42.75 122.0 <0.001 ***
>> groupC 31.20 29.89 32.50 151.8 <0.001 ***
>>
>> Thanks in advance!
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list