[R-sig-ME] brms: Multivariate LMEM with systematic missing data

Koen Neijenhuijs kn.journal.news at gmail.com
Fri Feb 3 11:39:24 CET 2017


Hi Paul and Jarrod,

@Paul: not including "meas" as a fixed effect was an error on my part. I
reran the model with it included (note, meas had no effect size in the
generated data). The output still shows the same kind of inflated
population estimates that I noted before:

 Family: gaussian (identity)
Formula: cbind(dep1, dep2, dep3) ~ group * meas + (1 + meas | id)
   Data: dat (Number of observations: 2997)
Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
         total post-warmup samples = 5000
   WAIC: Not computed

Group-Level Effects:
~id (Number of levels: 999)
                              Estimate Est.Error l-95% CI u-95% CI
Eff.Sample Rhat
sd(dep1_Intercept)               41.25     32.88     1.49   125.28
722 1.01
sd(dep1_meas)                    23.84     17.80     0.95    68.74
195 1.02
sd(dep2_Intercept)               28.35     21.61     0.81    81.16
643 1.00
sd(dep2_meas)                    15.10     10.70     0.78    40.85
280 1.01
sd(dep3_Intercept)                0.45      0.38     0.02     1.45
652 1.00
sd(dep3_meas)                     0.21      0.17     0.01     0.67
386 1.00
cor(dep1_Intercept,dep1_meas)    -0.31      0.56    -0.98     0.89
691 1.01
cor(dep2_Intercept,dep2_meas)    -0.31      0.56    -0.98     0.87
583 1.00
cor(dep3_Intercept,dep3_meas)    -0.33      0.58    -0.99     0.90
905 1.00

Population-Level Effects:
                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
dep1_Intercept    1165.57     38.58  1090.30  1243.48       1627 1.00
dep2_Intercept     528.38     22.42   485.62   572.92       1613 1.00
dep3_Intercept      30.07      0.43    29.25    30.91       1145 1.00
dep1_groupB       -185.97     54.52  -292.98   -78.96       1703 1.00
dep1_groupC        833.20     54.37   726.62   942.00       1766 1.00
dep1_meas          -53.49     17.92   -88.61   -17.70       1512 1.00
dep1_groupB:meas    33.89     25.41   -16.60    83.19       1727 1.00
dep1_groupC:meas    53.53     25.13     3.72   102.48       1672 1.00
dep2_groupB       1471.02     31.32  1409.44  1530.53       1711 1.00
dep2_groupC        945.18     31.53   882.38  1006.88       1704 1.00
dep2_meas           -2.86     10.40   -23.08    16.94       1533 1.00
dep2_groupB:meas     2.63     14.58   -24.96    31.52       1551 1.00
dep2_groupC:meas     4.24     14.56   -24.24    32.84       1650 1.00
dep3_groupB          9.49      0.60     8.33    10.69       1466 1.00
dep3_groupC          0.04      0.60    -1.14     1.21       1398 1.01
dep3_meas            0.09      0.20    -0.30     0.47       1056 1.00
dep3_groupB:meas     0.02      0.28    -0.52     0.55       1366 1.00
dep3_groupC:meas    -0.16      0.28    -0.72     0.37       1107 1.01

Family Specific Parameters:
                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma(dep1)         465.19      6.69   452.13   478.19       5000    1
sigma(dep2)         263.12      3.76   255.76   270.45       5000    1
sigma(dep3)           4.98      0.07     4.85     5.12       5000    1
rescor(dep1,dep2)    -0.02      0.02    -0.06     0.02       5000    1
rescor(dep1,dep3)     0.01      0.02    -0.03     0.04       5000    1
rescor(dep2,dep3)    -0.02      0.02    -0.06     0.02       5000    1

@Jarrod: Thanks for the translation, I will delve into it!

I believe it is safe to say that systematic missing values lead to inflated
population estimates. I reran the same type of model about 3 times (with
different seeds), and the same kind of weird things popped up. Of course, I
must confess this is a very crude simulation (my data-generation is far
from sophisticated).

Kind regards,

Koen

2017-02-02 20:41 GMT+01:00 Paul Buerkner <paul.buerkner at gmail.com>:

> Could it be that this is partly because you did not include "meas" as a
> fixed effect but only as a random effect? Without a corresponding fixed
> effect, random effects are centered around zero, which may not be what you
> want.
>
> 2017-02-02 11:04 GMT+01:00 Koen Neijenhuijs <kn.journal.news at gmail.com>:
>
>> Hello all,
>>
>> I was wondering what would happen with a multivariate LMEM when there is
>> systematic missing data. I ran a crude simulation to check this situation:
>>
>> Group A: Has data on Dependent variable 1 (dep1), 2 (dep2), and 3 (dep3)
>> Group B: Has data on dep1 and dep3
>> Group C: Has data on dep2 and dep3
>>
>> All data is measured three times (I am disregarding the effect of time for
>> simplicity's sake).
>>
>> The aim is to compare group A on the data it shares with group B and group
>> C. I created this data and ran a multivariate brms model on it. I did not
>> provide priors, as such, uninformative priors will be used. Code:
>>
>> set.seed(103)
>> # Data structure:
>> # id | group (A/B/C) | dep1 (mean(SD), A: 10(5), B: 15(5), C: NA) | dep2
>> (A: 20(5), B: NA, C: 40(5)), dep3 (A: 30(5), B: 40(5), C: 30(5))
>>
>> n_indiv <- 999
>> n_per_indiv <- 3
>> n_tot <- n_indiv * n_per_indiv
>> id <- gl(n_indiv, n_per_indiv)  ## generate identifier columns
>>
>> dat <- data.frame(id=id,
>>                   group=rep(c("A", "B", "C"),each=n_tot/3),
>>                   meas=rep(c(1, 2, 3), times=n_tot/n_per_indiv),
>>                   dep1=c(rnorm(n_tot/3,10,5), rnorm(n_tot/3,15,5),
>> rep("NA", times=n_tot/3)),
>>                   dep2=c(rnorm(n_tot/3,20,5), rep("NA", times=n_tot/3),
>> rnorm(n_tot/3,40,5)),
>>                   dep3=c(rnorm(n_tot/3,30,5), rnorm(n_tot/3,40,5),
>> rnorm(n_tot/3,30,5)))
>>
>> ncores <- detectCores() -1
>>
>> m1_brms <- brm(cbind(dep1, dep2, dep3) ~ group + (1+meas|id), data=dat,
>> family="gaussian",
>>                n.warmup=500, n.iter = 1500, n.chains=5, cores=ncores)
>>
>> sum_m1_brms <- summary(m1_brms)
>>
>> Everything converged nicely (trace plots are looking good, Rhat is fine I
>> believe). The output of the summary:
>>
>>  Family: gaussian (identity)
>> Formula: cbind(dep1, dep2, dep3) ~ group + (1 + meas | id)
>>    Data: dat (Number of observations: 2997)
>> Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
>>          total post-warmup samples = 5000
>>    WAIC: Not computed
>>
>> Group-Level Effects:
>> ~id (Number of levels: 999)
>>                               Estimate Est.Error l-95% CI u-95% CI
>> Eff.Sample Rhat
>> sd(dep1_Intercept)               45.29     37.05     2.16   145.59
>> 402 1.00
>> sd(dep1_meas)                    26.92     20.37     1.30    78.80
>> 226 1.03
>> sd(dep2_Intercept)               32.91     28.47     0.97   111.39
>> 184 1.03
>> sd(dep2_meas)                    18.89     13.85     0.75    54.06
>> 159 1.02
>> sd(dep3_Intercept)                0.54      0.50     0.02     2.08
>> 155 1.05
>> sd(dep3_meas)                     0.25      0.23     0.01     0.95
>> 93 1.05
>> cor(dep1_Intercept,dep1_meas)    -0.36      0.57    -0.99     0.89
>> 338 1.02
>> cor(dep2_Intercept,dep2_meas)    -0.40      0.56    -0.98     0.88
>> 241 1.01
>> cor(dep3_Intercept,dep3_meas)    -0.36      0.59    -0.99     0.92
>> 386 1.01
>>
>> Population-Level Effects:
>>                Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> dep1_Intercept  1058.94     14.62  1029.65  1087.34       5000 1.00
>> dep2_Intercept   523.02      8.40   506.45   540.22       2150 1.00
>> dep3_Intercept    30.25      0.16    29.94    30.56       5000 1.00
>> dep1_groupB     -118.01     20.82  -160.15   -77.53       5000 1.00
>> dep1_groupC      940.46     20.83   900.16   982.28       5000 1.00
>> dep2_groupB     1476.62     12.24  1452.73  1500.95        661 1.01
>> dep2_groupC      953.48     11.75   930.22   976.84       5000 1.00
>> dep3_groupB        9.52      0.22     9.08     9.95       5000 1.00
>> dep3_groupC       -0.29      0.22    -0.73     0.15       5000 1.00
>>
>> Family Specific Parameters:
>>                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> sigma(dep1)         465.54      6.59   452.02   478.22        755 1.00
>> sigma(dep2)         262.63      3.93   254.64   270.00        712 1.01
>> sigma(dep3)           4.98      0.07     4.84     5.12        699 1.01
>> rescor(dep1,dep2)    -0.02      0.02    -0.05     0.02       5000 1.00
>> rescor(dep1,dep3)     0.00      0.02    -0.03     0.04       5000 1.01
>> rescor(dep2,dep3)    -0.02      0.02    -0.06     0.02       5000 1.00
>>
>> The effects that are in the data, seem to be estimated, but the directions
>> and magnitudes of the estimates seem all over the place. In the data,
>> group
>> A was always smaller or equal (in the case of dep3_groupC). As you can
>> see,
>> the population estimate for dep1_groupB is in the wrong direction, and the
>> estimate for dep2_groupC is huge. The only estimates that seem
>> non-inflated
>> and correct are those for dep3, which had data on all three groups.
>>
>> Would it be safe to assume that systematic missing data on one group,
>> inflates the estimates between the remaining groups? I would love to hear
>> your thoughts!
>>
>> One further question. I tried to perform the same multivariate analysis
>> using MCMCglmm, but I find the syntax of the random and rcov parameters
>> hard to set, and the output even harder to interpret. I would love to get
>> some feedback on how to perform a similar analysis using the MCMCglmm
>> function. Code:
>>
>> m1 <- MCMCglmm(cbind(dep1, dep2, dep3) ~ group-1, random = ~us(1+meas):id,
>>                rcov = ~us(trait):units, data = dat, family = c("gaussian",
>> "gaussian", "gaussian"),
>>                     verbose = FALSE)
>>
>> summary(m1)
>>
>> Output:
>>
>>  Iterations = 3001:12991
>>  Thinning interval  = 10
>>  Sample size  = 1000
>>
>>  DIC: 119465.7
>>
>>  G-structure:  ~us(1 + meas):id
>>
>>                            post.mean   l-95% CI u-95% CI eff.samp
>> (Intercept):(Intercept).id    0.5868  0.0010440  2.54542    6.784
>> meas:(Intercept).id          -0.3008 -1.2656335  0.01044    6.621
>> (Intercept):meas.id          -0.3008 -1.2656335  0.01044    6.621
>> meas:meas.id                  0.1716  0.0004239  0.65286    6.936
>>
>>  R-structure:  ~us(trait):units
>>
>>                            post.mean   l-95% CI   u-95% CI eff.samp
>> traitdep1:traitdep1.units 2132866.20 2027876.02 2239057.50    911.5
>> traitdep2:traitdep1.units 1703387.51 1615965.82 1803474.41   1000.0
>> traitdep3:traitdep1.units   -1405.09   -2700.25      60.61    150.3
>> traitdep1:traitdep2.units 1703387.51 1615965.82 1803474.41   1000.0
>> traitdep2:traitdep2.units 2129068.98 2023232.02 2239145.62   1283.7
>> traitdep3:traitdep2.units   -1685.89   -3264.50    -184.62    136.2
>> traitdep1:traitdep3.units   -1405.09   -2700.25      60.61    150.3
>> traitdep2:traitdep3.units   -1685.89   -3264.50    -184.62    136.2
>> traitdep3:traitdep3.units      26.57      23.99      29.55    188.8
>>
>>  Location effects: cbind(dep1, dep2, dep3) ~ group - 1
>>
>>        post.mean l-95% CI u-95% CI eff.samp  pMCMC
>> groupA     30.69    30.13    31.37    226.0 <0.001 ***
>> groupB     41.29    39.97    42.75    122.0 <0.001 ***
>> groupC     31.20    29.89    32.50    151.8 <0.001 ***
>>
>> Thanks in advance!
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list