[R-sig-ME] brms: Multivariate LMEM with systematic missing data

Fri Feb 3 14:53:21 CET 2017

Hi,

@Paul Debes: You are absolutely correct, the NA were misspecified. I need
to take better care to my specifications.
@Paul Buerkner: With the above fixed, you are absolutely correct that I
only have 999 valid responses. Funnily, the coefficients only get more
ridiculous:

 Family: gaussian (identity)
Formula: cbind(dep1, dep2, dep3) ~ group * meas + (1 + meas | id)
   Data: dat (Number of observations: 999)
Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
         total post-warmup samples = 5000
   WAIC: Not computed

Group-Level Effects:
~id (Number of levels: 333)
                              Estimate Est.Error l-95% CI u-95% CI
Eff.Sample Rhat
sd(dep1_Intercept)                0.68      0.50     0.03     1.93
1127 1.00
sd(dep1_meas)                     0.30      0.23     0.01     0.88
609 1.01
sd(dep2_Intercept)                0.79      0.74     0.03     3.14
69 1.07
sd(dep2_meas)                     0.37      0.31     0.02     1.27
64 1.07
sd(dep3_Intercept)                0.63      0.55     0.03     2.09
656 1.01
sd(dep3_meas)                     0.26      0.23     0.01     0.90
691 1.01
cor(dep1_Intercept,dep1_meas)    -0.28      0.57    -0.98     0.89
911 1.01
cor(dep2_Intercept,dep2_meas)    -0.28      0.58    -0.98     0.91
385 1.01
cor(dep3_Intercept,dep3_meas)    -0.32      0.58    -0.99     0.89
1020 1.01

Population-Level Effects:
                   Estimate Est.Error   l-95% CI   u-95% CI Eff.Sample Rhat
dep1_Intercept        10.24      0.44       9.38      11.05       5000 1.01
dep2_Intercept        20.02      0.41      19.22      20.81       5000 1.00
dep3_Intercept        29.40      0.42      28.54      30.21       5000 1.00
dep1_groupB       210763.68 526479.52 -157461.18 1583956.49          3 3.56
dep1_groupC         9138.67  70713.21 -121719.20  157643.19          4 1.64
dep1_meas             -0.19      0.20      -0.57       0.20       5000 1.00
dep1_groupB:meas  -23205.11 100734.03 -312577.64  142648.44          4 2.48
dep1_groupC:meas  -71455.38 157002.26 -446894.22  182257.70          5 2.17
dep2_groupB       -20607.61 140109.92 -392849.23  238572.51          5 2.93
dep2_groupC      -148928.00 143151.46 -433430.59   80773.75          8 2.19
dep2_meas             -0.01      0.19      -0.39       0.37       5000 1.00
dep2_groupB:meas -142584.37 265356.46 -843416.71  159443.42          3 3.12
dep2_groupC:meas   43112.00 217047.74 -404953.43  715305.83         11 1.88
dep3_groupB       -50794.17 106310.18 -315974.63  133696.31          5 2.20
dep3_groupC        81455.65 112784.68 -103536.02  385103.89          7 1.90
dep3_meas              0.25      0.19      -0.12       0.64       5000 1.00
dep3_groupB:meas  -86503.76  82586.35 -238653.81  144958.27         19 1.34
dep3_groupC:meas  -54695.42  58965.65 -177878.21   33452.76          4 1.79

Family Specific Parameters:
                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma(dep1)           5.07      0.12     4.83     5.31       5000 1.01
sigma(dep2)           4.88      0.14     4.46     5.13        112 1.04
sigma(dep3)           5.03      0.12     4.81     5.27       5000 1.00
rescor(dep1,dep2)     0.03      0.03    -0.04     0.09       5000 1.00
rescor(dep1,dep3)     0.02      0.03    -0.04     0.09       5000 1.00
rescor(dep2,dep3)     0.01      0.03    -0.06     0.08       5000 1.01

@Paul Debes: I am not familiar with ASreml package. I'm currently looking
at the documentation, and seeing if this package can do what I want it to.

@Jarrod: Still very interested in hearing how MCMCglmm handles missing data
exactly. I found some earlier r-sig mailings surrounding it, however that
seemed to focus on missing data on fixed effects, instead of missing data
on outcome measures.

Thanks everyone so far for the great responses!

2017-02-03 14:19 GMT+01:00 Paul Debes <paul.debes at utu.fi>:

> Hi Koen,
>
> Your conclusion (biased estimates with missing data for specific groups)
> made me very curious if that is the case (I think this is quite important),
> so I just run the model in asreml-R. To exclude sampling error, I generated
> the data with 'mvrnorm' from 'MASS' with 'empirical = TRUE' (don't use this
> option for simulations) rather than with 'rnorm'. It gives estimates that
> agree with the simulated data.
>
> Did you run the models with dep2 and dep3 wrongly formatted as factors by
> specifying "NA" rather than NA?
>
>
> library("MASS")
> Sigma.no.cor = matrix(0,3,3)
> diag(Sigma.no.cor) = sqrt(5)
> Sigma.no.cor
>
> empirical.switch = TRUE # FALSE # set to FALSE for simulations, to TRUE
> for checking
> Sigma = Sigma.no.cor
>
> A.dat.matrix = mvrnorm(n_tot/3, c(10,20,30), Sigma= Sigma, empirical=
> empirical.switch)
> B.dat.matrix = mvrnorm(n_tot/3, c(15, 0,40), Sigma= Sigma, empirical=
> empirical.switch)
> B.dat.matrix[,2] = NA
> C.dat.matrix = mvrnorm(n_tot/3, c( 0,40,30), Sigma= Sigma, empirical=
> empirical.switch)
> C.dat.matrix[,1] = NA
>
> dat <- data.frame(id= id,
>         group =rep(c("A", "B", "C"),each=n_tot/3),
>         meas =rep(c(1, 2, 3), times=n_tot/n_per_indiv),
>         #dep1 =c(rnorm(n_tot/3,10,5), rnorm(n_tot/3,15,5), rep(NA,
> times=n_tot/3)), # I removed your quotes!
>         #dep2 =c(rnorm(n_tot/3,20,5), rep(NA, times=n_tot/3),
> rnorm(n_tot/3,40,5)),
>         #dep3 =c(rnorm(n_tot/3,30,5), rnorm(n_tot/3,40,5),
> rnorm(n_tot/3,30,5)) )
>         dep1 = c(A.dat.matrix[,1],B.dat.matrix[,1],C.dat.matrix[,1]),
>         dep2 = c(A.dat.matrix[,2],B.dat.matrix[,2],C.dat.matrix[,2]),
>         dep3 = c(A.dat.matrix[,3],B.dat.matrix[,3],C.dat.matrix[,3]))
>
> str(dat)
>
> library("asreml")
> m1_asreml = asreml(cbind(dep1, dep2, dep3) ~ -1 + trait:group + trait:meas,
>         random = ~ str(~id/meas ,~us(2):id(id)),
>         rcov = ~ units:us(trait),
>         data = dat,
>         maxiter = 1000)
> m1_asreml = update(m1_asreml)
>
> varcomp.nice(m1_asreml,digits=3)
>                      component std.error z.ratio   V
> id+id:meas!us(2).1:1        NA        NA      NA  V1
> id+id:meas!us(2).2:1    -0.005     0.012   -0.41  V2
> id+id:meas!us(2).2:2     0.006     0.011    0.57  V3
> R!variance               1.000        NA      NA  V4
> R!trait.dep1:dep1        2.228     0.073   30.60  V5
> R!trait.dep2:dep1       -0.009     0.073   -0.12  V6
> R!trait.dep2:dep2        2.228     0.073   30.71  V7
> R!trait.dep3:dep1       -0.010     0.053   -0.19  V8
> R!trait.dep3:dep2       -0.010     0.053   -0.19  V9
> R!trait.dep3:dep3        2.225     0.060   36.88 V10
>
> coef.nice(m1_asreml,digits=3)
>                    solution std.error z.ratio
> trait_dep1:meas      -0.006     0.041   -0.15
> trait_dep2:meas      -0.010     0.041   -0.24
> trait_dep3:meas      -0.033     0.033   -0.99
> trait_dep1:group_A   10.013     0.094  105.96
> trait_dep1:group_B   15.012     0.094  158.87
> trait_dep2:group_A   20.020     0.094  211.88
> trait_dep2:group_C   40.020     0.094  423.55
> trait_dep3:group_A   30.066     0.082  367.50
> trait_dep3:group_B   40.066     0.082  489.73
> trait_dep3:group_C   30.066     0.082  367.50
>
> Best,
> Paul
>
>
>
> On Fri, 03 Feb 2017 12:39:24 +0200, Koen Neijenhuijs <
> kn.journal.news at gmail.com> wrote:
>
> Hi Paul and Jarrod,
>>
>> @Paul: not including "meas" as a fixed effect was an error on my part. I
>> reran the model with it included (note, meas had no effect size in the
>> generated data). The output still shows the same kind of inflated
>> population estimates that I noted before:
>>
>>  Family: gaussian (identity)
>> Formula: cbind(dep1, dep2, dep3) ~ group * meas + (1 + meas | id)
>>    Data: dat (Number of observations: 2997)
>> Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
>>          total post-warmup samples = 5000
>>    WAIC: Not computed
>>
>> Group-Level Effects:
>> ~id (Number of levels: 999)
>>                               Estimate Est.Error l-95% CI u-95% CI
>> Eff.Sample Rhat
>> sd(dep1_Intercept)               41.25     32.88     1.49   125.28
>> 722 1.01
>> sd(dep1_meas)                    23.84     17.80     0.95    68.74
>> 195 1.02
>> sd(dep2_Intercept)               28.35     21.61     0.81    81.16
>> 643 1.00
>> sd(dep2_meas)                    15.10     10.70     0.78    40.85
>> 280 1.01
>> sd(dep3_Intercept)                0.45      0.38     0.02     1.45
>> 652 1.00
>> sd(dep3_meas)                     0.21      0.17     0.01     0.67
>> 386 1.00
>> cor(dep1_Intercept,dep1_meas)    -0.31      0.56    -0.98     0.89
>> 691 1.01
>> cor(dep2_Intercept,dep2_meas)    -0.31      0.56    -0.98     0.87
>> 583 1.00
>> cor(dep3_Intercept,dep3_meas)    -0.33      0.58    -0.99     0.90
>> 905 1.00
>>
>> Population-Level Effects:
>>                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> dep1_Intercept    1165.57     38.58  1090.30  1243.48       1627 1.00
>> dep2_Intercept     528.38     22.42   485.62   572.92       1613 1.00
>> dep3_Intercept      30.07      0.43    29.25    30.91       1145 1.00
>> dep1_groupB       -185.97     54.52  -292.98   -78.96       1703 1.00
>> dep1_groupC        833.20     54.37   726.62   942.00       1766 1.00
>> dep1_meas          -53.49     17.92   -88.61   -17.70       1512 1.00
>> dep1_groupB:meas    33.89     25.41   -16.60    83.19       1727 1.00
>> dep1_groupC:meas    53.53     25.13     3.72   102.48       1672 1.00
>> dep2_groupB       1471.02     31.32  1409.44  1530.53       1711 1.00
>> dep2_groupC        945.18     31.53   882.38  1006.88       1704 1.00
>> dep2_meas           -2.86     10.40   -23.08    16.94       1533 1.00
>> dep2_groupB:meas     2.63     14.58   -24.96    31.52       1551 1.00
>> dep2_groupC:meas     4.24     14.56   -24.24    32.84       1650 1.00
>> dep3_groupB          9.49      0.60     8.33    10.69       1466 1.00
>> dep3_groupC          0.04      0.60    -1.14     1.21       1398 1.01
>> dep3_meas            0.09      0.20    -0.30     0.47       1056 1.00
>> dep3_groupB:meas     0.02      0.28    -0.52     0.55       1366 1.00
>> dep3_groupC:meas    -0.16      0.28    -0.72     0.37       1107 1.01
>>
>> Family Specific Parameters:
>>                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> sigma(dep1)         465.19      6.69   452.13   478.19       5000    1
>> sigma(dep2)         263.12      3.76   255.76   270.45       5000    1
>> sigma(dep3)           4.98      0.07     4.85     5.12       5000    1
>> rescor(dep1,dep2)    -0.02      0.02    -0.06     0.02       5000    1
>> rescor(dep1,dep3)     0.01      0.02    -0.03     0.04       5000    1
>> rescor(dep2,dep3)    -0.02      0.02    -0.06     0.02       5000    1
>>
>> @Jarrod: Thanks for the translation, I will delve into it!
>>
>> I believe it is safe to say that systematic missing values lead to
>> inflated
>> population estimates. I reran the same type of model about 3 times (with
>> different seeds), and the same kind of weird things popped up. Of course,
>> I
>> must confess this is a very crude simulation (my data-generation is far
>> from sophisticated).
>>
>> Kind regards,
>>
>> Koen
>>
>> 2017-02-02 20:41 GMT+01:00 Paul Buerkner <paul.buerkner at gmail.com>:
>>
>> Could it be that this is partly because you did not include "meas" as a
>>> fixed effect but only as a random effect? Without a corresponding fixed
>>> effect, random effects are centered around zero, which may not be what
>>> you
>>> want.
>>>
>>> 2017-02-02 11:04 GMT+01:00 Koen Neijenhuijs <kn.journal.news at gmail.com>:
>>>
>>> Hello all,
>>>>
>>>> I was wondering what would happen with a multivariate LMEM when there is
>>>> systematic missing data. I ran a crude simulation to check this
>>>> situation:
>>>>
>>>> Group A: Has data on Dependent variable 1 (dep1), 2 (dep2), and 3 (dep3)
>>>> Group B: Has data on dep1 and dep3
>>>> Group C: Has data on dep2 and dep3
>>>>
>>>> All data is measured three times (I am disregarding the effect of time
>>>> for
>>>> simplicity's sake).
>>>>
>>>> The aim is to compare group A on the data it shares with group B and
>>>> group
>>>> C. I created this data and ran a multivariate brms model on it. I did
>>>> not
>>>> provide priors, as such, uninformative priors will be used. Code:
>>>>
>>>> set.seed(103)
>>>> # Data structure:
>>>> # id | group (A/B/C) | dep1 (mean(SD), A: 10(5), B: 15(5), C: NA) | dep2
>>>> (A: 20(5), B: NA, C: 40(5)), dep3 (A: 30(5), B: 40(5), C: 30(5))
>>>>
>>>> n_indiv <- 999
>>>> n_per_indiv <- 3
>>>> n_tot <- n_indiv * n_per_indiv
>>>> id <- gl(n_indiv, n_per_indiv)  ## generate identifier columns
>>>>
>>>> dat <- data.frame(id=id,
>>>>                   group=rep(c("A", "B", "C"),each=n_tot/3),
>>>>                   meas=rep(c(1, 2, 3), times=n_tot/n_per_indiv),
>>>>                   dep1=c(rnorm(n_tot/3,10,5), rnorm(n_tot/3,15,5),
>>>> rep("NA", times=n_tot/3)),
>>>>                   dep2=c(rnorm(n_tot/3,20,5), rep("NA", times=n_tot/3),
>>>> rnorm(n_tot/3,40,5)),
>>>>                   dep3=c(rnorm(n_tot/3,30,5), rnorm(n_tot/3,40,5),
>>>> rnorm(n_tot/3,30,5)))
>>>>
>>>> ncores <- detectCores() -1
>>>>
>>>> m1_brms <- brm(cbind(dep1, dep2, dep3) ~ group + (1+meas|id), data=dat,
>>>> family="gaussian",
>>>>                n.warmup=500, n.iter = 1500, n.chains=5, cores=ncores)
>>>>
>>>> sum_m1_brms <- summary(m1_brms)
>>>>
>>>> Everything converged nicely (trace plots are looking good, Rhat is fine
>>>> I
>>>> believe). The output of the summary:
>>>>
>>>>  Family: gaussian (identity)
>>>> Formula: cbind(dep1, dep2, dep3) ~ group + (1 + meas | id)
>>>>    Data: dat (Number of observations: 2997)
>>>> Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
>>>>          total post-warmup samples = 5000
>>>>    WAIC: Not computed
>>>>
>>>> Group-Level Effects:
>>>> ~id (Number of levels: 999)
>>>>                               Estimate Est.Error l-95% CI u-95% CI
>>>> Eff.Sample Rhat
>>>> sd(dep1_Intercept)               45.29     37.05     2.16   145.59
>>>> 402 1.00
>>>> sd(dep1_meas)                    26.92     20.37     1.30    78.80
>>>> 226 1.03
>>>> sd(dep2_Intercept)               32.91     28.47     0.97   111.39
>>>> 184 1.03
>>>> sd(dep2_meas)                    18.89     13.85     0.75    54.06
>>>> 159 1.02
>>>> sd(dep3_Intercept)                0.54      0.50     0.02     2.08
>>>> 155 1.05
>>>> sd(dep3_meas)                     0.25      0.23     0.01     0.95
>>>> 93 1.05
>>>> cor(dep1_Intercept,dep1_meas)    -0.36      0.57    -0.99     0.89
>>>> 338 1.02
>>>> cor(dep2_Intercept,dep2_meas)    -0.40      0.56    -0.98     0.88
>>>> 241 1.01
>>>> cor(dep3_Intercept,dep3_meas)    -0.36      0.59    -0.99     0.92
>>>> 386 1.01
>>>>
>>>> Population-Level Effects:
>>>>                Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>>>> dep1_Intercept  1058.94     14.62  1029.65  1087.34       5000 1.00
>>>> dep2_Intercept   523.02      8.40   506.45   540.22       2150 1.00
>>>> dep3_Intercept    30.25      0.16    29.94    30.56       5000 1.00
>>>> dep1_groupB     -118.01     20.82  -160.15   -77.53       5000 1.00
>>>> dep1_groupC      940.46     20.83   900.16   982.28       5000 1.00
>>>> dep2_groupB     1476.62     12.24  1452.73  1500.95        661 1.01
>>>> dep2_groupC      953.48     11.75   930.22   976.84       5000 1.00
>>>> dep3_groupB        9.52      0.22     9.08     9.95       5000 1.00
>>>> dep3_groupC       -0.29      0.22    -0.73     0.15       5000 1.00
>>>>
>>>> Family Specific Parameters:
>>>>                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>>>> sigma(dep1)         465.54      6.59   452.02   478.22        755 1.00
>>>> sigma(dep2)         262.63      3.93   254.64   270.00        712 1.01
>>>> sigma(dep3)           4.98      0.07     4.84     5.12        699 1.01
>>>> rescor(dep1,dep2)    -0.02      0.02    -0.05     0.02       5000 1.00
>>>> rescor(dep1,dep3)     0.00      0.02    -0.03     0.04       5000 1.01
>>>> rescor(dep2,dep3)    -0.02      0.02    -0.06     0.02       5000 1.00
>>>>
>>>> The effects that are in the data, seem to be estimated, but the
>>>> directions
>>>> and magnitudes of the estimates seem all over the place. In the data,
>>>> group
>>>> A was always smaller or equal (in the case of dep3_groupC). As you can
>>>> see,
>>>> the population estimate for dep1_groupB is in the wrong direction, and
>>>> the
>>>> estimate for dep2_groupC is huge. The only estimates that seem
>>>> non-inflated
>>>> and correct are those for dep3, which had data on all three groups.
>>>>
>>>> Would it be safe to assume that systematic missing data on one group,
>>>> inflates the estimates between the remaining groups? I would love to
>>>> hear
>>>> your thoughts!
>>>>
>>>> One further question. I tried to perform the same multivariate analysis
>>>> using MCMCglmm, but I find the syntax of the random and rcov parameters
>>>> hard to set, and the output even harder to interpret. I would love to
>>>> get
>>>> some feedback on how to perform a similar analysis using the MCMCglmm
>>>> function. Code:
>>>>
>>>> m1 <- MCMCglmm(cbind(dep1, dep2, dep3) ~ group-1, random =
>>>> ~us(1+meas):id,
>>>>                rcov = ~us(trait):units, data = dat, family =
>>>> c("gaussian",
>>>> "gaussian", "gaussian"),
>>>>                     verbose = FALSE)
>>>>
>>>> summary(m1)
>>>>
>>>> Output:
>>>>
>>>>  Iterations = 3001:12991
>>>>  Thinning interval  = 10
>>>>  Sample size  = 1000
>>>>
>>>>  DIC: 119465.7
>>>>
>>>>  G-structure:  ~us(1 + meas):id
>>>>
>>>>                            post.mean   l-95% CI u-95% CI eff.samp
>>>> (Intercept):(Intercept).id    0.5868  0.0010440  2.54542    6.784
>>>> meas:(Intercept).id          -0.3008 -1.2656335  0.01044    6.621
>>>> (Intercept):meas.id          -0.3008 -1.2656335  0.01044    6.621
>>>> meas:meas.id                  0.1716  0.0004239  0.65286    6.936
>>>>
>>>>  R-structure:  ~us(trait):units
>>>>
>>>>                            post.mean   l-95% CI   u-95% CI eff.samp
>>>> traitdep1:traitdep1.units 2132866.20 2027876.02 2239057.50    911.5
>>>> traitdep2:traitdep1.units 1703387.51 1615965.82 1803474.41   1000.0
>>>> traitdep3:traitdep1.units   -1405.09   -2700.25      60.61    150.3
>>>> traitdep1:traitdep2.units 1703387.51 1615965.82 1803474.41   1000.0
>>>> traitdep2:traitdep2.units 2129068.98 2023232.02 2239145.62   1283.7
>>>> traitdep3:traitdep2.units   -1685.89   -3264.50    -184.62    136.2
>>>> traitdep1:traitdep3.units   -1405.09   -2700.25      60.61    150.3
>>>> traitdep2:traitdep3.units   -1685.89   -3264.50    -184.62    136.2
>>>> traitdep3:traitdep3.units      26.57      23.99      29.55    188.8
>>>>
>>>>  Location effects: cbind(dep1, dep2, dep3) ~ group - 1
>>>>
>>>>        post.mean l-95% CI u-95% CI eff.samp  pMCMC
>>>> groupA     30.69    30.13    31.37    226.0 <0.001 ***
>>>> groupB     41.29    39.97    42.75    122.0 <0.001 ***
>>>> groupC     31.20    29.89    32.50    151.8 <0.001 ***
>>>>
>>>> Thanks in advance!
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>>
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>

	[[alternative HTML version deleted]]