[R-meta] Coding multi-measure correlational studies for multilevel meta-analysis

Thu Dec 14 04:21:25 CET 2023

Dear Wolfgang, thank you so much. A few observations.

1- This is, as you said, "very tedious to construct". So, I really wonder
if we "*want* the model not to give us estimates of 'measure-specific'
pooled correlations", then, can't we just average (maybe using
metafor::aggregate.escalc) across "ri" for different measures manually and
this way, reduce the data rows for a multi-measure study to 28 rows just
like a single-measure?

2- The difficulty of coding these studies extends to other
variable-specific moderators as well. For example, if I want to code for
the reliability of the variables in each pair, there again, things get
messy in multi-measure studies. So, here I should average over reliability
values for each variable across different measures?

3- What if the variable-specific moderators in a multi-measure study were
categorical? Say, qualitative features of the measures used (e.g., standard
vs. researcher-developed). Now, we can't average over this feature for each
variable across different measures. So what can we do?

4- Regarding rcalc(), I actually intentionally didn't use it, because, at
times, studies used multiple samples and times of measurement.

Thank you,
Yuhang

On Wed, Dec 13, 2023 at 7:06 AM Viechtbauer, Wolfgang (NP) <
wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:

> Hi Yuhang,
>
> First of all, I would suggest to create two separate variables for the two
> variables, like this:
>
>     study    ri   ni  var1  var2
> 1)  1        .1   85  1     2
> ...
> 28) 1        .2   85  7     8
>
> Then you can use rcalc() to create the var-cov matrix of the (raw or
> r-to-z transformed) correlation coefficients within studies (the 'V'
> matrix), that is, if your dataset is called 'dat', you can do:
>
> tmp <- rcalc(ri ~ var1 + var2 | study, ni=ni, data=dat)
> V <- tmp$V
> dat <- tmp$dat
>
> Sidenote: For 8 variables, there are 8*7/2 correlations (or generally, for
> p variables, p*(p-1)/2 -- this is one of those equations one eventually has
> memorized due to using it so often).
>
> For a study (say, study 2) that used multiple measures for one of the
> variables (say, variable 8), there are then actually 9 variables and hence
> 9*8/2 = 36 correlations. The structure then is:
>
>     study    ri   ni  var1  var2  measure1 measure2
> 1)  2        .1   78  1     2     a        b
> ...
> 7)  2        .3   78  1     8     a        x
> 8)  2        .2   78  1     8     a        y
> 9)  2        .4   78  2     3     b        c
> ...
> 14) 2        .3   78  2     8     b        x
> 15) 2        .2   78  2     8     b        y
> 16) 2        .5   78  3     4     c        d
> ...
> 20) 2        .4   78  3     8     c        x
> 21) 2        .5   78  3     8     c        y
> 22) 2        .1   78  4     5     d        e
> ...
> 25) 2        .0   78  4     8     d        x
> 26) 2        .1   78  4     8     d        y
> 27) 2        .3   78  5     6     e        f
> ...
> 29) 2        .3   78  5     8     e        x
> 30) 2        .2   78  5     8     e        y
> 31) 2        .2   78  6     7     f        g
> 32) 2        .3   78  6     8     f        x
> 33) 2        .1   78  6     8     f        y
> 34) 2        .1   78  7     8     g        x
> 35) 2        .2   78  7     8     g        y
> 36) 2        .3   78  8     8     x        y
>
> The actual values used for measure1 and measure2 are irrelevant, as long
> as you use them consistently within a study. For studies that only used a
> single measure for each variable, you can leave measure1 and measure2
> blank. For studies that used multiple measures for more than one variable,
> you have to keep expanding this structure. It just becomes very tedious to
> construct.
>
> Then for rcalc(), you need to paste together var1 and measure1 and var2
> and measure2:
>
> dat$v1m1 <- paste0(dat$var1, ".", dat$measure1)
> dat$v2m2 <- paste0(dat$var2, ".", dat$measure2)
>
> and use those in rcalc():
>
> tmp <- rcalc(ri ~ v1m1 + v2m2 | study, ni=ni, data=dat)
> V <- tmp$V
> dat <- tmp$dat
>
> For the actual model fitted with rma.mv(), you don't use the combination
> of v1m1 and v2m2, but the combination of var1 and var2 as the predictor:
>
> dat$var1var2 <- paste0(dat$var1, ".", dat$var2)
>
> since you *want* the model not to give you estimates of 'measure-specific'
> pooled correlations, but you want to average over multiple measures for the
> same variable. So the model could be:
>
> rma.mv(yi, V, mods = ~ 0 + var1var2, random = ~ var1var2 | study,
> struct="UN", data=dat)
>
> However, this model will need to estimate 28*27/2 = 378 correlations plus
> 28 variances (tau^2 values) for the random effects, so in total 406 (!!)
> parameters (the general equation is p*(p+1)/2), plus the 28 fixed effects.
> That's a lot of parameters in the unstructured var-cov matrix of the random
> effects, so unless you have a lot of studies (hundreds if not thousands),
> this is going to be difficult or essentially impossible. This aside, the
> model allows for no heterogeneity when there are multiple correlations for
> the same var1var2 pair. A simple way to allow for this is to add another
> estimate specific random effect to the model:
>
> dat$id <- 1:nrow(dat)
> rma.mv(yi, V, mods = ~ 0 + var1var2, random = list(~ var1var2 | study, ~
> 1 | id), struct="UN", data=dat)
>
> This is simplistic, since it assumes that the heterogeneity in multiple
> correlations for the same pair is the same regardless of the pair. If you
> have a lot of data, one could try:
>
> rma.mv(yi, V, mods = ~ 0 + var1var2, random = list(~ var1var2 | study, ~
> var1var2 | id), struct=c("UN","DIAG"), data=dat)
>
> which would use separate estimate-level random effects for each pair, but
> this adds another 28 parameters to the model. But who cares about another
> 28 if one already has 406 ...
>
> Realistically, one needs to simplify the random effects structure. On the
> opposite end, there is the minimalistic:
>
> res <- rma.mv(yi, V, mods = ~ 0 + var1var2, random = ~ 1 | study/id,
> data=dat)
> res
>
> which, due to its overly simplistic nature, really needs to be followed-up
> with:
>
> robust(res, cluster=study, clubSandwich=TRUE)
>
> (could do the same with the models above, but this is less likely to
> matter if one actually manages to fit these complex models).
>
> An interesting question is what kind of structures of intermediate
> complexity one could consider.
>
> But I'll stop here for now, since this is getting way too long anyway.
>
> Best,
> Wolfgang
>
> > -----Original Message-----
> > From: R-sig-meta-analysis <r-sig-meta-analysis-bounces using r-project.org>
> On Behalf
> > Of Yuhang Hu via R-sig-meta-analysis
> > Sent: Wednesday, December 13, 2023 06:21
> > To: R meta <r-sig-meta-analysis using r-project.org>
> > Cc: Yuhang Hu <yh342 using nau.edu>
> > Subject: [R-meta] Coding multi-measure correlational studies for
> multilevel
> > meta-analysis
> >
> > Hello Experts,
> >
> > I'm collecting the correlations between 8 variables from several studies.
> > If a study has used a single measure for all these 8 variables, I will
> need
> > 28 rows (assuming no missing) to capture all those correlations i.e.,
> > var1.var2 = combn(1:8, 2, FUN=\(i)paste(i,collapse = ".")):
> >
> >     study   ri   var1.var2
> > 1)  1        .1   1.2
> >  ...
> > 28) 1       .2    7.8
> >
> > But if a study has used, say, two measures (e.g., 1, 2) for two of those
> 8
> > variables (e.g., variables "1" and "2" in 'var1.var2'), then, I wonder
> how
> > **best** to capture the additional 13 correlations arising due to the
> > additional measure used for "1" and "2" in that study in my data for
> > multilevel modeling purposes?
> >
> > One approach might be to add a single column called, say "measure" to add
> > just those additional rows in that multi-measure study:
> >
> >     study   ri   var1.var2  measure
> > 1)   1       .1    1.2
> >  ...
> > 6)   1       .6     1.7           1
> > 7)   1       .4     1.7           2
> > ...
> > 12)  1       .8     2.7          1
> > 13)  1       .7     2.7          2
> > ...
> >
> > But this looks messy. For instance, what should be the value of "measure"
> > for the var1.var2 rows that have used a single measure (e.g., var1.var2
> ==
> > 1.2)? And can "measure" coded this way be used in the random part of the
> > model (metafor::rma.mv)?
> >
> > Thanks,
> > Yuhang
>

	[[alternative HTML version deleted]]