[R-meta] Questions about fitting a multilevel meta-analysis model with effect sizes measured as different outcome

Sun Jul 1 15:02:22 CEST 2018

Dear Heidi,

These are some interesting questions. In part, I suspect these questions are a consequence of certain multilevel software requiring users to specify the levels of their variables. This is not necessary in metafor (or nlme/lme4 or many other multilevel / mixed-effects model packages). Instead, the level of a variable is simply a consequence of its coding. Since this is a common source of confusion, I'll provide an extensive answer.

Consider an example. Let's say we have a dataset like this:

study  x1
1      3
2      1
3      2
4      4
...    ...

study  outcome  x2  y
1      A        5   .
1      B        8   .
2      A        3   .
3      A        4   .
3      B        8   .
3      C        6   .
4      B        5   .

So, x1 is a study-level predictor and x2 is an outcome-level predictor.

We do not really need to separate the dataset into the study and outcome levels. A predictor at the study level is just a constant within studies, so we just construct the dataset in this way:

study  outcome  x1  x2  y
1      A        3   5   .
1      B        3   8   .
2      A        1   3   .
3      A        2   4   .
3      B        2   8   .
3      C        2   6   .
4      B        4   5   .

We can then fit a model like:

rma.mv(y ~ x1 + x2, V, random = ~ 1 | study/outcome, data=dat)

which properly treats x1 as a study-level and x2 as an outcome-level predictor (the order of x1 and x2 is not relevant -- 'y ~ x2 + x2' would be the same).

The other issue is the coding of the random effects. Above, "~ 1 | study/outcome" adds random effects (intercepts) for each level of 'study' and random effects (intercepts) for each level of 'outcome' within 'study'. So here, I *explicitly* indicated the nesting of the two variables.

I could also do:

dat$study.outcome <- paste(dat$study, dat$outcome, sep=".")

which simply pastes 'study' and 'outcome' together into a single variable, so the dataset now looks like this:

study  outcome  x1  x2  y  study.outcome
1      A        3   5   .  1.A
1      B        3   8   .  1.B
2      A        1   3   .  2.A
3      A        2   4   .  3.A
3      B        2   8   .  3.B
3      C        2   6   .  3.C
4      B        4   5   .  4.B

and then fit the model:

rma.mv(y ~ x1 + x2, V, random = list(~ 1 | study, ~ 1 | study.outcome), data=dat)

This will give the exact same results. Here, through the coding of my random effects, I add random effects at the study level and *implicitly* I add random effects for each level of outcome within studies.

The order in the list is totally irrelevant. So, I could also do:

rma.mv(y ~ x1 + x2, V, random = list(~ 1 | study.outcome, ~ 1 | study), data=dat)

which will again give the same results (but I hope it is clear that '~ 1 | outcome/study' WOULD be rather different -- and pretty nonsensical).

Note above that 'study.outcome' is unique for every row. So, in fact, I could also just do:

dat$id <- 1:nrow(dat)

study  outcome  x1  x2  y  study.outcome  id
1      A        3   5   .  1.A            1
1      B        3   8   .  1.B            2
2      A        1   3   .  2.A            3
3      A        2   4   .  3.A            4 
3      B        2   8   .  3.B            5
3      C        2   6   .  3.C            6
4      B        4   5   .  4.B            7

and then fit:

rma.mv(y ~ x1 + x2, V, random = list(~ 1 | study, ~ 1 | id), data=dat)

This would again be the same.

Sidenote: If there are multiple rows with the same level of 'outcome' within 'study' (for one or multiple studies), then the latter is NOT the same anymore. Then one could in fact fit the model:

rma.mv(y ~ x1 + x2, V, random = ~ 1 | study/outcome/id, data=dat)

but I would only consider doing this if a sufficient number of studies provide multiple estimates for the same outcome (as otherwise the 'outcome' and 'id' level variance components will be very difficult to distinguish).

Okay, but leaving aside this point, let's consider one other model:

rma.mv(y ~ x1 + x2, V, random = list(~ 1 | study, ~ 1 | outcome), data=dat)

Now this is REALLY different. This allows for *crossed* random effects; in this example, it allows 1.A, 2.A, and 3.A to share the same random effect (and 1.B, 3.B, and 4.B share the same random effect; and 3.C is feeling really lonely all by itself, but such is life). And of course 1.A and 1.B share the same study-level random effect, since they come from the same study. And so on.

So, this model not only allows for correlation in estimates coming from the same study, but also allows for correlation in estimates across studies, as long as they measured the same outcome. With '~ 1 | study/outcome', estimates from the same study are allowed to be correlated (since they share the same random effect), but estimates from different studies, whether they measured the same outcome or not, do not share any common random effects and are hence not correlated.

I hope this clarifies how coding of predictors and random effects work in metafor.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Man Hey CHIU
Sent: Tuesday, 26 June, 2018 5:10
To: r-sig-meta-analysis using r-project.org
Subject: [R-meta] Questions about fitting a multilevel meta-analysis model with effect sizes measured as different outcome

(It seems that I accidentally cancelled my last submission, so I resent this email. Sorry for the inconvenience caused.)

Dear Dr. Viechtbauer and All,

I am writing this letter to seek for your help on multilevel meta-analysis using the metafor package you have developed.

In our meta-analysis, we chose to conduct a mutilevel meta-analysis due to multiple effect sizes from a single study (i.e., dependence).

Thus, we are thinking about a 3-level random effect meta-analysis model with level 1 being sampling biases, level 2 being the variation of effect sizes measured as different outcomes within the same studies and level 3 being the variation between studies across the whole dataset.

During running the analyses, we have encountered a specific problem about the organization of the data file that is entered into the model. We have one column 'StudyName' representing each study, 'Outcome' representing each outcome within a study (p.s. different studies can have the same outcome) and 'EffectID' representing each unique effect size across all studies (p.s. if the whole dataset has 300 effect sizes, there would be 300 different EffectID.)

And I wonder whether the 'Outcome' (the first line of code below) or the 'EffectID' (the second line of code below) should be used to get our desired outcome:

(1) rma.mv(yi=Hedge.g, V = Variances, data=df, slab=StudyName, random = list(~1|Outcome, ~1|StudyName))

(2) rma.mv(yi=Hedge.g, V = Variances, data=df, slab=StudyName, random = list(~1|EffectID, ~1|StudyName))

I have read the demonstration of doing the multilevel meta-analysis on sample dataset dat.konstantopoulos2011, the 'school' column can have the same code which may be reused in different districts. Yet, actually, every school should be unique. In contrast, in my case, different studies can share the same outcomes. Since my case is different than the example that you showed, I am not so sure how should I fit my dataset to the model.

Another question would be whether the order of specific effect inside list() would matter. As the following two line of code would give the same result, how does the function know which level 2 and level 3 is? Since level 2 should be clustered in level 3, why don't we have to specify the level 2 and level 3?

The final question we have encountered would be about adding moderators to the above function. Since in multilevel-meta analysis, there might be covariates on the level 2 outcome and level 3 study respectively, e.g., similar outcome may be measured across different studies. Would the function automatically detect the covariate as a level 2 covariate when that value is different across the outcomes in a specific cluster (i.e. study) or do we have to specify the level of covariates?

Thank you in advance for your time and help, we really appreciate your input on these questions.

Regards,
Heidi Chiu