[R-sig-ME] brms: Multivariate LMEM with systematic missing data

Paul Buerkner paul.buerkner at gmail.com
Fri Feb 3 11:52:16 CET 2017


Hi Koen,

now that I am looking at your simulations more closely I am not at all
surprised it leads to bias.

As neither brms nor MCMCglmm (correct me if I am wrong Jarrod) can handle
missing values, rows containing NAs are removed. So, as a result of your
simulations, 2 / 3 of the data will be excluded. In brms, the number of obs
shows 2997 since in counts all responses, so 2997 / 3 * 3 = 2997.

Best,
Paul

2017-02-03 11:39 GMT+01:00 Koen Neijenhuijs <kn.journal.news at gmail.com>:

> Hi Paul and Jarrod,
>
> @Paul: not including "meas" as a fixed effect was an error on my part. I
> reran the model with it included (note, meas had no effect size in the
> generated data). The output still shows the same kind of inflated
> population estimates that I noted before:
>
>  Family: gaussian (identity)
> Formula: cbind(dep1, dep2, dep3) ~ group * meas + (1 + meas | id)
>    Data: dat (Number of observations: 2997)
> Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
>          total post-warmup samples = 5000
>    WAIC: Not computed
>
> Group-Level Effects:
> ~id (Number of levels: 999)
>                               Estimate Est.Error l-95% CI u-95% CI
> Eff.Sample Rhat
> sd(dep1_Intercept)               41.25     32.88     1.49   125.28
> 722 1.01
> sd(dep1_meas)                    23.84     17.80     0.95    68.74
> 195 1.02
> sd(dep2_Intercept)               28.35     21.61     0.81    81.16
> 643 1.00
> sd(dep2_meas)                    15.10     10.70     0.78    40.85
> 280 1.01
> sd(dep3_Intercept)                0.45      0.38     0.02     1.45
> 652 1.00
> sd(dep3_meas)                     0.21      0.17     0.01     0.67
> 386 1.00
> cor(dep1_Intercept,dep1_meas)    -0.31      0.56    -0.98     0.89
> 691 1.01
> cor(dep2_Intercept,dep2_meas)    -0.31      0.56    -0.98     0.87
> 583 1.00
> cor(dep3_Intercept,dep3_meas)    -0.33      0.58    -0.99     0.90
> 905 1.00
>
> Population-Level Effects:
>                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
> dep1_Intercept    1165.57     38.58  1090.30  1243.48       1627 1.00
> dep2_Intercept     528.38     22.42   485.62   572.92       1613 1.00
> dep3_Intercept      30.07      0.43    29.25    30.91       1145 1.00
> dep1_groupB       -185.97     54.52  -292.98   -78.96       1703 1.00
> dep1_groupC        833.20     54.37   726.62   942.00       1766 1.00
> dep1_meas          -53.49     17.92   -88.61   -17.70       1512 1.00
> dep1_groupB:meas    33.89     25.41   -16.60    83.19       1727 1.00
> dep1_groupC:meas    53.53     25.13     3.72   102.48       1672 1.00
> dep2_groupB       1471.02     31.32  1409.44  1530.53       1711 1.00
> dep2_groupC        945.18     31.53   882.38  1006.88       1704 1.00
> dep2_meas           -2.86     10.40   -23.08    16.94       1533 1.00
> dep2_groupB:meas     2.63     14.58   -24.96    31.52       1551 1.00
> dep2_groupC:meas     4.24     14.56   -24.24    32.84       1650 1.00
> dep3_groupB          9.49      0.60     8.33    10.69       1466 1.00
> dep3_groupC          0.04      0.60    -1.14     1.21       1398 1.01
> dep3_meas            0.09      0.20    -0.30     0.47       1056 1.00
> dep3_groupB:meas     0.02      0.28    -0.52     0.55       1366 1.00
> dep3_groupC:meas    -0.16      0.28    -0.72     0.37       1107 1.01
>
> Family Specific Parameters:
>                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
> sigma(dep1)         465.19      6.69   452.13   478.19       5000    1
> sigma(dep2)         263.12      3.76   255.76   270.45       5000    1
> sigma(dep3)           4.98      0.07     4.85     5.12       5000    1
> rescor(dep1,dep2)    -0.02      0.02    -0.06     0.02       5000    1
> rescor(dep1,dep3)     0.01      0.02    -0.03     0.04       5000    1
> rescor(dep2,dep3)    -0.02      0.02    -0.06     0.02       5000    1
>
> @Jarrod: Thanks for the translation, I will delve into it!
>
> I believe it is safe to say that systematic missing values lead to
> inflated population estimates. I reran the same type of model about 3 times
> (with different seeds), and the same kind of weird things popped up. Of
> course, I must confess this is a very crude simulation (my data-generation
> is far from sophisticated).
>
> Kind regards,
>
> Koen
>
> 2017-02-02 20:41 GMT+01:00 Paul Buerkner <paul.buerkner at gmail.com>:
>
>> Could it be that this is partly because you did not include "meas" as a
>> fixed effect but only as a random effect? Without a corresponding fixed
>> effect, random effects are centered around zero, which may not be what you
>> want.
>>
>> 2017-02-02 11:04 GMT+01:00 Koen Neijenhuijs <kn.journal.news at gmail.com>:
>>
>>> Hello all,
>>>
>>> I was wondering what would happen with a multivariate LMEM when there is
>>> systematic missing data. I ran a crude simulation to check this
>>> situation:
>>>
>>> Group A: Has data on Dependent variable 1 (dep1), 2 (dep2), and 3 (dep3)
>>> Group B: Has data on dep1 and dep3
>>> Group C: Has data on dep2 and dep3
>>>
>>> All data is measured three times (I am disregarding the effect of time
>>> for
>>> simplicity's sake).
>>>
>>> The aim is to compare group A on the data it shares with group B and
>>> group
>>> C. I created this data and ran a multivariate brms model on it. I did not
>>> provide priors, as such, uninformative priors will be used. Code:
>>>
>>> set.seed(103)
>>> # Data structure:
>>> # id | group (A/B/C) | dep1 (mean(SD), A: 10(5), B: 15(5), C: NA) | dep2
>>> (A: 20(5), B: NA, C: 40(5)), dep3 (A: 30(5), B: 40(5), C: 30(5))
>>>
>>> n_indiv <- 999
>>> n_per_indiv <- 3
>>> n_tot <- n_indiv * n_per_indiv
>>> id <- gl(n_indiv, n_per_indiv)  ## generate identifier columns
>>>
>>> dat <- data.frame(id=id,
>>>                   group=rep(c("A", "B", "C"),each=n_tot/3),
>>>                   meas=rep(c(1, 2, 3), times=n_tot/n_per_indiv),
>>>                   dep1=c(rnorm(n_tot/3,10,5), rnorm(n_tot/3,15,5),
>>> rep("NA", times=n_tot/3)),
>>>                   dep2=c(rnorm(n_tot/3,20,5), rep("NA", times=n_tot/3),
>>> rnorm(n_tot/3,40,5)),
>>>                   dep3=c(rnorm(n_tot/3,30,5), rnorm(n_tot/3,40,5),
>>> rnorm(n_tot/3,30,5)))
>>>
>>> ncores <- detectCores() -1
>>>
>>> m1_brms <- brm(cbind(dep1, dep2, dep3) ~ group + (1+meas|id), data=dat,
>>> family="gaussian",
>>>                n.warmup=500, n.iter = 1500, n.chains=5, cores=ncores)
>>>
>>> sum_m1_brms <- summary(m1_brms)
>>>
>>> Everything converged nicely (trace plots are looking good, Rhat is fine I
>>> believe). The output of the summary:
>>>
>>>  Family: gaussian (identity)
>>> Formula: cbind(dep1, dep2, dep3) ~ group + (1 + meas | id)
>>>    Data: dat (Number of observations: 2997)
>>> Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
>>>          total post-warmup samples = 5000
>>>    WAIC: Not computed
>>>
>>> Group-Level Effects:
>>> ~id (Number of levels: 999)
>>>                               Estimate Est.Error l-95% CI u-95% CI
>>> Eff.Sample Rhat
>>> sd(dep1_Intercept)               45.29     37.05     2.16   145.59
>>> 402 1.00
>>> sd(dep1_meas)                    26.92     20.37     1.30    78.80
>>> 226 1.03
>>> sd(dep2_Intercept)               32.91     28.47     0.97   111.39
>>> 184 1.03
>>> sd(dep2_meas)                    18.89     13.85     0.75    54.06
>>> 159 1.02
>>> sd(dep3_Intercept)                0.54      0.50     0.02     2.08
>>> 155 1.05
>>> sd(dep3_meas)                     0.25      0.23     0.01     0.95
>>> 93 1.05
>>> cor(dep1_Intercept,dep1_meas)    -0.36      0.57    -0.99     0.89
>>> 338 1.02
>>> cor(dep2_Intercept,dep2_meas)    -0.40      0.56    -0.98     0.88
>>> 241 1.01
>>> cor(dep3_Intercept,dep3_meas)    -0.36      0.59    -0.99     0.92
>>> 386 1.01
>>>
>>> Population-Level Effects:
>>>                Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>>> dep1_Intercept  1058.94     14.62  1029.65  1087.34       5000 1.00
>>> dep2_Intercept   523.02      8.40   506.45   540.22       2150 1.00
>>> dep3_Intercept    30.25      0.16    29.94    30.56       5000 1.00
>>> dep1_groupB     -118.01     20.82  -160.15   -77.53       5000 1.00
>>> dep1_groupC      940.46     20.83   900.16   982.28       5000 1.00
>>> dep2_groupB     1476.62     12.24  1452.73  1500.95        661 1.01
>>> dep2_groupC      953.48     11.75   930.22   976.84       5000 1.00
>>> dep3_groupB        9.52      0.22     9.08     9.95       5000 1.00
>>> dep3_groupC       -0.29      0.22    -0.73     0.15       5000 1.00
>>>
>>> Family Specific Parameters:
>>>                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>>> sigma(dep1)         465.54      6.59   452.02   478.22        755 1.00
>>> sigma(dep2)         262.63      3.93   254.64   270.00        712 1.01
>>> sigma(dep3)           4.98      0.07     4.84     5.12        699 1.01
>>> rescor(dep1,dep2)    -0.02      0.02    -0.05     0.02       5000 1.00
>>> rescor(dep1,dep3)     0.00      0.02    -0.03     0.04       5000 1.01
>>> rescor(dep2,dep3)    -0.02      0.02    -0.06     0.02       5000 1.00
>>>
>>> The effects that are in the data, seem to be estimated, but the
>>> directions
>>> and magnitudes of the estimates seem all over the place. In the data,
>>> group
>>> A was always smaller or equal (in the case of dep3_groupC). As you can
>>> see,
>>> the population estimate for dep1_groupB is in the wrong direction, and
>>> the
>>> estimate for dep2_groupC is huge. The only estimates that seem
>>> non-inflated
>>> and correct are those for dep3, which had data on all three groups.
>>>
>>> Would it be safe to assume that systematic missing data on one group,
>>> inflates the estimates between the remaining groups? I would love to hear
>>> your thoughts!
>>>
>>> One further question. I tried to perform the same multivariate analysis
>>> using MCMCglmm, but I find the syntax of the random and rcov parameters
>>> hard to set, and the output even harder to interpret. I would love to get
>>> some feedback on how to perform a similar analysis using the MCMCglmm
>>> function. Code:
>>>
>>> m1 <- MCMCglmm(cbind(dep1, dep2, dep3) ~ group-1, random =
>>> ~us(1+meas):id,
>>>                rcov = ~us(trait):units, data = dat, family =
>>> c("gaussian",
>>> "gaussian", "gaussian"),
>>>                     verbose = FALSE)
>>>
>>> summary(m1)
>>>
>>> Output:
>>>
>>>  Iterations = 3001:12991
>>>  Thinning interval  = 10
>>>  Sample size  = 1000
>>>
>>>  DIC: 119465.7
>>>
>>>  G-structure:  ~us(1 + meas):id
>>>
>>>                            post.mean   l-95% CI u-95% CI eff.samp
>>> (Intercept):(Intercept).id    0.5868  0.0010440  2.54542    6.784
>>> meas:(Intercept).id          -0.3008 -1.2656335  0.01044    6.621
>>> (Intercept):meas.id          -0.3008 -1.2656335  0.01044    6.621
>>> meas:meas.id                  0.1716  0.0004239  0.65286    6.936
>>>
>>>  R-structure:  ~us(trait):units
>>>
>>>                            post.mean   l-95% CI   u-95% CI eff.samp
>>> traitdep1:traitdep1.units 2132866.20 2027876.02 2239057.50    911.5
>>> traitdep2:traitdep1.units 1703387.51 1615965.82 1803474.41   1000.0
>>> traitdep3:traitdep1.units   -1405.09   -2700.25      60.61    150.3
>>> traitdep1:traitdep2.units 1703387.51 1615965.82 1803474.41   1000.0
>>> traitdep2:traitdep2.units 2129068.98 2023232.02 2239145.62   1283.7
>>> traitdep3:traitdep2.units   -1685.89   -3264.50    -184.62    136.2
>>> traitdep1:traitdep3.units   -1405.09   -2700.25      60.61    150.3
>>> traitdep2:traitdep3.units   -1685.89   -3264.50    -184.62    136.2
>>> traitdep3:traitdep3.units      26.57      23.99      29.55    188.8
>>>
>>>  Location effects: cbind(dep1, dep2, dep3) ~ group - 1
>>>
>>>        post.mean l-95% CI u-95% CI eff.samp  pMCMC
>>> groupA     30.69    30.13    31.37    226.0 <0.001 ***
>>> groupB     41.29    39.97    42.75    122.0 <0.001 ***
>>> groupC     31.20    29.89    32.50    151.8 <0.001 ***
>>>
>>> Thanks in advance!
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list