[R-sig-ME] brms: Multivariate LMEM with systematic missing data
Koen Neijenhuijs
kn.journal.news at gmail.com
Fri Feb 3 14:53:21 CET 2017
Hi,
@Paul Debes: You are absolutely correct, the NA were misspecified. I need
to take better care to my specifications.
@Paul Buerkner: With the above fixed, you are absolutely correct that I
only have 999 valid responses. Funnily, the coefficients only get more
ridiculous:
Family: gaussian (identity)
Formula: cbind(dep1, dep2, dep3) ~ group * meas + (1 + meas | id)
Data: dat (Number of observations: 999)
Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
total post-warmup samples = 5000
WAIC: Not computed
Group-Level Effects:
~id (Number of levels: 333)
Estimate Est.Error l-95% CI u-95% CI
Eff.Sample Rhat
sd(dep1_Intercept) 0.68 0.50 0.03 1.93
1127 1.00
sd(dep1_meas) 0.30 0.23 0.01 0.88
609 1.01
sd(dep2_Intercept) 0.79 0.74 0.03 3.14
69 1.07
sd(dep2_meas) 0.37 0.31 0.02 1.27
64 1.07
sd(dep3_Intercept) 0.63 0.55 0.03 2.09
656 1.01
sd(dep3_meas) 0.26 0.23 0.01 0.90
691 1.01
cor(dep1_Intercept,dep1_meas) -0.28 0.57 -0.98 0.89
911 1.01
cor(dep2_Intercept,dep2_meas) -0.28 0.58 -0.98 0.91
385 1.01
cor(dep3_Intercept,dep3_meas) -0.32 0.58 -0.99 0.89
1020 1.01
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
dep1_Intercept 10.24 0.44 9.38 11.05 5000 1.01
dep2_Intercept 20.02 0.41 19.22 20.81 5000 1.00
dep3_Intercept 29.40 0.42 28.54 30.21 5000 1.00
dep1_groupB 210763.68 526479.52 -157461.18 1583956.49 3 3.56
dep1_groupC 9138.67 70713.21 -121719.20 157643.19 4 1.64
dep1_meas -0.19 0.20 -0.57 0.20 5000 1.00
dep1_groupB:meas -23205.11 100734.03 -312577.64 142648.44 4 2.48
dep1_groupC:meas -71455.38 157002.26 -446894.22 182257.70 5 2.17
dep2_groupB -20607.61 140109.92 -392849.23 238572.51 5 2.93
dep2_groupC -148928.00 143151.46 -433430.59 80773.75 8 2.19
dep2_meas -0.01 0.19 -0.39 0.37 5000 1.00
dep2_groupB:meas -142584.37 265356.46 -843416.71 159443.42 3 3.12
dep2_groupC:meas 43112.00 217047.74 -404953.43 715305.83 11 1.88
dep3_groupB -50794.17 106310.18 -315974.63 133696.31 5 2.20
dep3_groupC 81455.65 112784.68 -103536.02 385103.89 7 1.90
dep3_meas 0.25 0.19 -0.12 0.64 5000 1.00
dep3_groupB:meas -86503.76 82586.35 -238653.81 144958.27 19 1.34
dep3_groupC:meas -54695.42 58965.65 -177878.21 33452.76 4 1.79
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma(dep1) 5.07 0.12 4.83 5.31 5000 1.01
sigma(dep2) 4.88 0.14 4.46 5.13 112 1.04
sigma(dep3) 5.03 0.12 4.81 5.27 5000 1.00
rescor(dep1,dep2) 0.03 0.03 -0.04 0.09 5000 1.00
rescor(dep1,dep3) 0.02 0.03 -0.04 0.09 5000 1.00
rescor(dep2,dep3) 0.01 0.03 -0.06 0.08 5000 1.01
@Paul Debes: I am not familiar with ASreml package. I'm currently looking
at the documentation, and seeing if this package can do what I want it to.
@Jarrod: Still very interested in hearing how MCMCglmm handles missing data
exactly. I found some earlier r-sig mailings surrounding it, however that
seemed to focus on missing data on fixed effects, instead of missing data
on outcome measures.
Thanks everyone so far for the great responses!
2017-02-03 14:19 GMT+01:00 Paul Debes <paul.debes at utu.fi>:
> Hi Koen,
>
> Your conclusion (biased estimates with missing data for specific groups)
> made me very curious if that is the case (I think this is quite important),
> so I just run the model in asreml-R. To exclude sampling error, I generated
> the data with 'mvrnorm' from 'MASS' with 'empirical = TRUE' (don't use this
> option for simulations) rather than with 'rnorm'. It gives estimates that
> agree with the simulated data.
>
> Did you run the models with dep2 and dep3 wrongly formatted as factors by
> specifying "NA" rather than NA?
>
>
> library("MASS")
> Sigma.no.cor = matrix(0,3,3)
> diag(Sigma.no.cor) = sqrt(5)
> Sigma.no.cor
>
> empirical.switch = TRUE # FALSE # set to FALSE for simulations, to TRUE
> for checking
> Sigma = Sigma.no.cor
>
> A.dat.matrix = mvrnorm(n_tot/3, c(10,20,30), Sigma= Sigma, empirical=
> empirical.switch)
> B.dat.matrix = mvrnorm(n_tot/3, c(15, 0,40), Sigma= Sigma, empirical=
> empirical.switch)
> B.dat.matrix[,2] = NA
> C.dat.matrix = mvrnorm(n_tot/3, c( 0,40,30), Sigma= Sigma, empirical=
> empirical.switch)
> C.dat.matrix[,1] = NA
>
> dat <- data.frame(id= id,
> group =rep(c("A", "B", "C"),each=n_tot/3),
> meas =rep(c(1, 2, 3), times=n_tot/n_per_indiv),
> #dep1 =c(rnorm(n_tot/3,10,5), rnorm(n_tot/3,15,5), rep(NA,
> times=n_tot/3)), # I removed your quotes!
> #dep2 =c(rnorm(n_tot/3,20,5), rep(NA, times=n_tot/3),
> rnorm(n_tot/3,40,5)),
> #dep3 =c(rnorm(n_tot/3,30,5), rnorm(n_tot/3,40,5),
> rnorm(n_tot/3,30,5)) )
> dep1 = c(A.dat.matrix[,1],B.dat.matrix[,1],C.dat.matrix[,1]),
> dep2 = c(A.dat.matrix[,2],B.dat.matrix[,2],C.dat.matrix[,2]),
> dep3 = c(A.dat.matrix[,3],B.dat.matrix[,3],C.dat.matrix[,3]))
>
> str(dat)
>
> library("asreml")
> m1_asreml = asreml(cbind(dep1, dep2, dep3) ~ -1 + trait:group + trait:meas,
> random = ~ str(~id/meas ,~us(2):id(id)),
> rcov = ~ units:us(trait),
> data = dat,
> maxiter = 1000)
> m1_asreml = update(m1_asreml)
>
> varcomp.nice(m1_asreml,digits=3)
> component std.error z.ratio V
> id+id:meas!us(2).1:1 NA NA NA V1
> id+id:meas!us(2).2:1 -0.005 0.012 -0.41 V2
> id+id:meas!us(2).2:2 0.006 0.011 0.57 V3
> R!variance 1.000 NA NA V4
> R!trait.dep1:dep1 2.228 0.073 30.60 V5
> R!trait.dep2:dep1 -0.009 0.073 -0.12 V6
> R!trait.dep2:dep2 2.228 0.073 30.71 V7
> R!trait.dep3:dep1 -0.010 0.053 -0.19 V8
> R!trait.dep3:dep2 -0.010 0.053 -0.19 V9
> R!trait.dep3:dep3 2.225 0.060 36.88 V10
>
> coef.nice(m1_asreml,digits=3)
> solution std.error z.ratio
> trait_dep1:meas -0.006 0.041 -0.15
> trait_dep2:meas -0.010 0.041 -0.24
> trait_dep3:meas -0.033 0.033 -0.99
> trait_dep1:group_A 10.013 0.094 105.96
> trait_dep1:group_B 15.012 0.094 158.87
> trait_dep2:group_A 20.020 0.094 211.88
> trait_dep2:group_C 40.020 0.094 423.55
> trait_dep3:group_A 30.066 0.082 367.50
> trait_dep3:group_B 40.066 0.082 489.73
> trait_dep3:group_C 30.066 0.082 367.50
>
> Best,
> Paul
>
>
>
> On Fri, 03 Feb 2017 12:39:24 +0200, Koen Neijenhuijs <
> kn.journal.news at gmail.com> wrote:
>
> Hi Paul and Jarrod,
>>
>> @Paul: not including "meas" as a fixed effect was an error on my part. I
>> reran the model with it included (note, meas had no effect size in the
>> generated data). The output still shows the same kind of inflated
>> population estimates that I noted before:
>>
>> Family: gaussian (identity)
>> Formula: cbind(dep1, dep2, dep3) ~ group * meas + (1 + meas | id)
>> Data: dat (Number of observations: 2997)
>> Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
>> total post-warmup samples = 5000
>> WAIC: Not computed
>>
>> Group-Level Effects:
>> ~id (Number of levels: 999)
>> Estimate Est.Error l-95% CI u-95% CI
>> Eff.Sample Rhat
>> sd(dep1_Intercept) 41.25 32.88 1.49 125.28
>> 722 1.01
>> sd(dep1_meas) 23.84 17.80 0.95 68.74
>> 195 1.02
>> sd(dep2_Intercept) 28.35 21.61 0.81 81.16
>> 643 1.00
>> sd(dep2_meas) 15.10 10.70 0.78 40.85
>> 280 1.01
>> sd(dep3_Intercept) 0.45 0.38 0.02 1.45
>> 652 1.00
>> sd(dep3_meas) 0.21 0.17 0.01 0.67
>> 386 1.00
>> cor(dep1_Intercept,dep1_meas) -0.31 0.56 -0.98 0.89
>> 691 1.01
>> cor(dep2_Intercept,dep2_meas) -0.31 0.56 -0.98 0.87
>> 583 1.00
>> cor(dep3_Intercept,dep3_meas) -0.33 0.58 -0.99 0.90
>> 905 1.00
>>
>> Population-Level Effects:
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> dep1_Intercept 1165.57 38.58 1090.30 1243.48 1627 1.00
>> dep2_Intercept 528.38 22.42 485.62 572.92 1613 1.00
>> dep3_Intercept 30.07 0.43 29.25 30.91 1145 1.00
>> dep1_groupB -185.97 54.52 -292.98 -78.96 1703 1.00
>> dep1_groupC 833.20 54.37 726.62 942.00 1766 1.00
>> dep1_meas -53.49 17.92 -88.61 -17.70 1512 1.00
>> dep1_groupB:meas 33.89 25.41 -16.60 83.19 1727 1.00
>> dep1_groupC:meas 53.53 25.13 3.72 102.48 1672 1.00
>> dep2_groupB 1471.02 31.32 1409.44 1530.53 1711 1.00
>> dep2_groupC 945.18 31.53 882.38 1006.88 1704 1.00
>> dep2_meas -2.86 10.40 -23.08 16.94 1533 1.00
>> dep2_groupB:meas 2.63 14.58 -24.96 31.52 1551 1.00
>> dep2_groupC:meas 4.24 14.56 -24.24 32.84 1650 1.00
>> dep3_groupB 9.49 0.60 8.33 10.69 1466 1.00
>> dep3_groupC 0.04 0.60 -1.14 1.21 1398 1.01
>> dep3_meas 0.09 0.20 -0.30 0.47 1056 1.00
>> dep3_groupB:meas 0.02 0.28 -0.52 0.55 1366 1.00
>> dep3_groupC:meas -0.16 0.28 -0.72 0.37 1107 1.01
>>
>> Family Specific Parameters:
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> sigma(dep1) 465.19 6.69 452.13 478.19 5000 1
>> sigma(dep2) 263.12 3.76 255.76 270.45 5000 1
>> sigma(dep3) 4.98 0.07 4.85 5.12 5000 1
>> rescor(dep1,dep2) -0.02 0.02 -0.06 0.02 5000 1
>> rescor(dep1,dep3) 0.01 0.02 -0.03 0.04 5000 1
>> rescor(dep2,dep3) -0.02 0.02 -0.06 0.02 5000 1
>>
>> @Jarrod: Thanks for the translation, I will delve into it!
>>
>> I believe it is safe to say that systematic missing values lead to
>> inflated
>> population estimates. I reran the same type of model about 3 times (with
>> different seeds), and the same kind of weird things popped up. Of course,
>> I
>> must confess this is a very crude simulation (my data-generation is far
>> from sophisticated).
>>
>> Kind regards,
>>
>> Koen
>>
>> 2017-02-02 20:41 GMT+01:00 Paul Buerkner <paul.buerkner at gmail.com>:
>>
>> Could it be that this is partly because you did not include "meas" as a
>>> fixed effect but only as a random effect? Without a corresponding fixed
>>> effect, random effects are centered around zero, which may not be what
>>> you
>>> want.
>>>
>>> 2017-02-02 11:04 GMT+01:00 Koen Neijenhuijs <kn.journal.news at gmail.com>:
>>>
>>> Hello all,
>>>>
>>>> I was wondering what would happen with a multivariate LMEM when there is
>>>> systematic missing data. I ran a crude simulation to check this
>>>> situation:
>>>>
>>>> Group A: Has data on Dependent variable 1 (dep1), 2 (dep2), and 3 (dep3)
>>>> Group B: Has data on dep1 and dep3
>>>> Group C: Has data on dep2 and dep3
>>>>
>>>> All data is measured three times (I am disregarding the effect of time
>>>> for
>>>> simplicity's sake).
>>>>
>>>> The aim is to compare group A on the data it shares with group B and
>>>> group
>>>> C. I created this data and ran a multivariate brms model on it. I did
>>>> not
>>>> provide priors, as such, uninformative priors will be used. Code:
>>>>
>>>> set.seed(103)
>>>> # Data structure:
>>>> # id | group (A/B/C) | dep1 (mean(SD), A: 10(5), B: 15(5), C: NA) | dep2
>>>> (A: 20(5), B: NA, C: 40(5)), dep3 (A: 30(5), B: 40(5), C: 30(5))
>>>>
>>>> n_indiv <- 999
>>>> n_per_indiv <- 3
>>>> n_tot <- n_indiv * n_per_indiv
>>>> id <- gl(n_indiv, n_per_indiv) ## generate identifier columns
>>>>
>>>> dat <- data.frame(id=id,
>>>> group=rep(c("A", "B", "C"),each=n_tot/3),
>>>> meas=rep(c(1, 2, 3), times=n_tot/n_per_indiv),
>>>> dep1=c(rnorm(n_tot/3,10,5), rnorm(n_tot/3,15,5),
>>>> rep("NA", times=n_tot/3)),
>>>> dep2=c(rnorm(n_tot/3,20,5), rep("NA", times=n_tot/3),
>>>> rnorm(n_tot/3,40,5)),
>>>> dep3=c(rnorm(n_tot/3,30,5), rnorm(n_tot/3,40,5),
>>>> rnorm(n_tot/3,30,5)))
>>>>
>>>> ncores <- detectCores() -1
>>>>
>>>> m1_brms <- brm(cbind(dep1, dep2, dep3) ~ group + (1+meas|id), data=dat,
>>>> family="gaussian",
>>>> n.warmup=500, n.iter = 1500, n.chains=5, cores=ncores)
>>>>
>>>> sum_m1_brms <- summary(m1_brms)
>>>>
>>>> Everything converged nicely (trace plots are looking good, Rhat is fine
>>>> I
>>>> believe). The output of the summary:
>>>>
>>>> Family: gaussian (identity)
>>>> Formula: cbind(dep1, dep2, dep3) ~ group + (1 + meas | id)
>>>> Data: dat (Number of observations: 2997)
>>>> Samples: 5 chains, each with iter = 1500; warmup = 500; thin = 1;
>>>> total post-warmup samples = 5000
>>>> WAIC: Not computed
>>>>
>>>> Group-Level Effects:
>>>> ~id (Number of levels: 999)
>>>> Estimate Est.Error l-95% CI u-95% CI
>>>> Eff.Sample Rhat
>>>> sd(dep1_Intercept) 45.29 37.05 2.16 145.59
>>>> 402 1.00
>>>> sd(dep1_meas) 26.92 20.37 1.30 78.80
>>>> 226 1.03
>>>> sd(dep2_Intercept) 32.91 28.47 0.97 111.39
>>>> 184 1.03
>>>> sd(dep2_meas) 18.89 13.85 0.75 54.06
>>>> 159 1.02
>>>> sd(dep3_Intercept) 0.54 0.50 0.02 2.08
>>>> 155 1.05
>>>> sd(dep3_meas) 0.25 0.23 0.01 0.95
>>>> 93 1.05
>>>> cor(dep1_Intercept,dep1_meas) -0.36 0.57 -0.99 0.89
>>>> 338 1.02
>>>> cor(dep2_Intercept,dep2_meas) -0.40 0.56 -0.98 0.88
>>>> 241 1.01
>>>> cor(dep3_Intercept,dep3_meas) -0.36 0.59 -0.99 0.92
>>>> 386 1.01
>>>>
>>>> Population-Level Effects:
>>>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>>>> dep1_Intercept 1058.94 14.62 1029.65 1087.34 5000 1.00
>>>> dep2_Intercept 523.02 8.40 506.45 540.22 2150 1.00
>>>> dep3_Intercept 30.25 0.16 29.94 30.56 5000 1.00
>>>> dep1_groupB -118.01 20.82 -160.15 -77.53 5000 1.00
>>>> dep1_groupC 940.46 20.83 900.16 982.28 5000 1.00
>>>> dep2_groupB 1476.62 12.24 1452.73 1500.95 661 1.01
>>>> dep2_groupC 953.48 11.75 930.22 976.84 5000 1.00
>>>> dep3_groupB 9.52 0.22 9.08 9.95 5000 1.00
>>>> dep3_groupC -0.29 0.22 -0.73 0.15 5000 1.00
>>>>
>>>> Family Specific Parameters:
>>>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>>>> sigma(dep1) 465.54 6.59 452.02 478.22 755 1.00
>>>> sigma(dep2) 262.63 3.93 254.64 270.00 712 1.01
>>>> sigma(dep3) 4.98 0.07 4.84 5.12 699 1.01
>>>> rescor(dep1,dep2) -0.02 0.02 -0.05 0.02 5000 1.00
>>>> rescor(dep1,dep3) 0.00 0.02 -0.03 0.04 5000 1.01
>>>> rescor(dep2,dep3) -0.02 0.02 -0.06 0.02 5000 1.00
>>>>
>>>> The effects that are in the data, seem to be estimated, but the
>>>> directions
>>>> and magnitudes of the estimates seem all over the place. In the data,
>>>> group
>>>> A was always smaller or equal (in the case of dep3_groupC). As you can
>>>> see,
>>>> the population estimate for dep1_groupB is in the wrong direction, and
>>>> the
>>>> estimate for dep2_groupC is huge. The only estimates that seem
>>>> non-inflated
>>>> and correct are those for dep3, which had data on all three groups.
>>>>
>>>> Would it be safe to assume that systematic missing data on one group,
>>>> inflates the estimates between the remaining groups? I would love to
>>>> hear
>>>> your thoughts!
>>>>
>>>> One further question. I tried to perform the same multivariate analysis
>>>> using MCMCglmm, but I find the syntax of the random and rcov parameters
>>>> hard to set, and the output even harder to interpret. I would love to
>>>> get
>>>> some feedback on how to perform a similar analysis using the MCMCglmm
>>>> function. Code:
>>>>
>>>> m1 <- MCMCglmm(cbind(dep1, dep2, dep3) ~ group-1, random =
>>>> ~us(1+meas):id,
>>>> rcov = ~us(trait):units, data = dat, family =
>>>> c("gaussian",
>>>> "gaussian", "gaussian"),
>>>> verbose = FALSE)
>>>>
>>>> summary(m1)
>>>>
>>>> Output:
>>>>
>>>> Iterations = 3001:12991
>>>> Thinning interval = 10
>>>> Sample size = 1000
>>>>
>>>> DIC: 119465.7
>>>>
>>>> G-structure: ~us(1 + meas):id
>>>>
>>>> post.mean l-95% CI u-95% CI eff.samp
>>>> (Intercept):(Intercept).id 0.5868 0.0010440 2.54542 6.784
>>>> meas:(Intercept).id -0.3008 -1.2656335 0.01044 6.621
>>>> (Intercept):meas.id -0.3008 -1.2656335 0.01044 6.621
>>>> meas:meas.id 0.1716 0.0004239 0.65286 6.936
>>>>
>>>> R-structure: ~us(trait):units
>>>>
>>>> post.mean l-95% CI u-95% CI eff.samp
>>>> traitdep1:traitdep1.units 2132866.20 2027876.02 2239057.50 911.5
>>>> traitdep2:traitdep1.units 1703387.51 1615965.82 1803474.41 1000.0
>>>> traitdep3:traitdep1.units -1405.09 -2700.25 60.61 150.3
>>>> traitdep1:traitdep2.units 1703387.51 1615965.82 1803474.41 1000.0
>>>> traitdep2:traitdep2.units 2129068.98 2023232.02 2239145.62 1283.7
>>>> traitdep3:traitdep2.units -1685.89 -3264.50 -184.62 136.2
>>>> traitdep1:traitdep3.units -1405.09 -2700.25 60.61 150.3
>>>> traitdep2:traitdep3.units -1685.89 -3264.50 -184.62 136.2
>>>> traitdep3:traitdep3.units 26.57 23.99 29.55 188.8
>>>>
>>>> Location effects: cbind(dep1, dep2, dep3) ~ group - 1
>>>>
>>>> post.mean l-95% CI u-95% CI eff.samp pMCMC
>>>> groupA 30.69 30.13 31.37 226.0 <0.001 ***
>>>> groupB 41.29 39.97 42.75 122.0 <0.001 ***
>>>> groupC 31.20 29.89 32.50 151.8 <0.001 ***
>>>>
>>>> Thanks in advance!
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>>
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list