[R-meta] Coding multi-measure correlational studies for multilevel meta-analysis

Thu Dec 14 17:03:32 CET 2023

Please see my responses below.

Best,
Wolfgang

> -----Original Message-----
> From: Yuhang Hu <yh342 using nau.edu>
> Sent: Thursday, December 14, 2023 04:21
> To: Viechtbauer, Wolfgang (NP) <wolfgang.viechtbauer using maastrichtuniversity.nl>
> Cc: R Special Interest Group for Meta-Analysis <r-sig-meta-analysis using r-
> project.org>
> Subject: Re: [R-meta] Coding multi-measure correlational studies for multilevel
> meta-analysis
>
> Dear Wolfgang, thank you so much. A few observations.
>
> 1- This is, as you said, "very tedious to construct". So, I really wonder if we
> "*want* the model not to give us estimates of 'measure-specific' pooled
> correlations", then, can't we just average (maybe using
> metafor::aggregate.escalc) across "ri" for different measures manually and this
> way, reduce the data rows for a multi-measure study to 28 rows just like a
> single-measure?

In order to do this correctly, one would have to compute the var-cov matrix of the correlations one wants to pool over anyway. Of course, nobody is going to stop you from just averaging them and pretending that it is just a single correlation coefficient. It's just objectively wrong.

> 2- The difficulty of coding these studies extends to other variable-specific
> moderators as well. For example, if I want to code for the reliability of the
> variables in each pair, there again, things get messy in multi-measure studies.
> So, here I should average over reliability values for each variable across
> different measures?

As above.

> 3- What if the variable-specific moderators in a multi-measure study were
> categorical? Say, qualitative features of the measures used (e.g., standard vs.
> researcher-developed). Now, we can't average over this feature for each variable
> across different measures. So what can we do?

This would in fact be an argument for not averaging and include the various correlations corresponding to different measures (with a measure-specific moderator variable).

> 4- Regarding rcalc(), I actually intentionally didn't use it, because, at times,
> studies used multiple samples and times of measurement.

But rcalc() can handle this. Multiple samples are independent, so those just need to be treated (for the purposes of rcalc()) as different studies. And there is, from a computational point, no difference between multiple measures and multiple time points. It just creates more variables. Of course this complicates the construction of the dataset used as input to rcalc(), but that's a practical issue.

I totally get that constructing such a dataset is quite difficult and time-consuming, if not impossible. What I describe below is the ideal approach where one constructs the dataset for a study that includes every possible pair of variables, where variables can reflect different constructs, multiple measures of the same construct, and/or multiple timepoints. If this is not possible due to logistic/practical reasons, then one would have to consider alternative approaches. A rough var-cov matrix could still be constructed with the vcalc() function. One could even go as far as pretending V is diagonal. In any case, cluster-robust inference methods should then be used.

> Thank you,
> Yuhang
>
> On Wed, Dec 13, 2023 at 7:06 AM Viechtbauer, Wolfgang (NP)
> <mailto:wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> Hi Yuhang,
>
> First of all, I would suggest to create two separate variables for the two
> variables, like this:
>
>     study    ri   ni  var1  var2
> 1)  1        .1   85  1     2
> ...
> 28) 1        .2   85  7     8
>
> Then you can use rcalc() to create the var-cov matrix of the (raw or r-to-z
> transformed) correlation coefficients within studies (the 'V' matrix), that is,
> if your dataset is called 'dat', you can do:
>
> tmp <- rcalc(ri ~ var1 + var2 | study, ni=ni, data=dat)
> V <- tmp$V
> dat <- tmp$dat
>
> Sidenote: For 8 variables, there are 8*7/2 correlations (or generally, for p
> variables, p*(p-1)/2 -- this is one of those equations one eventually has
> memorized due to using it so often).
>
> For a study (say, study 2) that used multiple measures for one of the variables
> (say, variable 8), there are then actually 9 variables and hence 9*8/2 = 36
> correlations. The structure then is:
>
>     study    ri   ni  var1  var2  measure1 measure2
> 1)  2        .1   78  1     2     a        b
> ...
> 7)  2        .3   78  1     8     a        x
> 8)  2        .2   78  1     8     a        y
> 9)  2        .4   78  2     3     b        c
> ...
> 14) 2        .3   78  2     8     b        x
> 15) 2        .2   78  2     8     b        y
> 16) 2        .5   78  3     4     c        d
> ...
> 20) 2        .4   78  3     8     c        x
> 21) 2        .5   78  3     8     c        y
> 22) 2        .1   78  4     5     d        e
> ...
> 25) 2        .0   78  4     8     d        x
> 26) 2        .1   78  4     8     d        y
> 27) 2        .3   78  5     6     e        f
> ...
> 29) 2        .3   78  5     8     e        x
> 30) 2        .2   78  5     8     e        y
> 31) 2        .2   78  6     7     f        g
> 32) 2        .3   78  6     8     f        x
> 33) 2        .1   78  6     8     f        y
> 34) 2        .1   78  7     8     g        x
> 35) 2        .2   78  7     8     g        y
> 36) 2        .3   78  8     8     x        y
>
> The actual values used for measure1 and measure2 are irrelevant, as long as you
> use them consistently within a study. For studies that only used a single
> measure for each variable, you can leave measure1 and measure2 blank. For
> studies that used multiple measures for more than one variable, you have to keep
> expanding this structure. It just becomes very tedious to construct.
>
> Then for rcalc(), you need to paste together var1 and measure1 and var2 and
> measure2:
>
> dat$v1m1 <- paste0(dat$var1, ".", dat$measure1)
> dat$v2m2 <- paste0(dat$var2, ".", dat$measure2)
>
> and use those in rcalc():
>
> tmp <- rcalc(ri ~ v1m1 + v2m2 | study, ni=ni, data=dat)
> V <- tmp$V
> dat <- tmp$dat
>
> For the actual model fitted with http://rma.mv(), you don't use the combination
> of v1m1 and v2m2, but the combination of var1 and var2 as the predictor:
>
> dat$var1var2 <- paste0(dat$var1, ".", dat$var2)
>
> since you *want* the model not to give you estimates of 'measure-specific'
> pooled correlations, but you want to average over multiple measures for the same
> variable. So the model could be:
>
> http://rma.mv(yi, V, mods = ~ 0 + var1var2, random = ~ var1var2 | study,
> struct="UN", data=dat)
>
> However, this model will need to estimate 28*27/2 = 378 correlations plus 28
> variances (tau^2 values) for the random effects, so in total 406 (!!) parameters
> (the general equation is p*(p+1)/2), plus the 28 fixed effects. That's a lot of
> parameters in the unstructured var-cov matrix of the random effects, so unless
> you have a lot of studies (hundreds if not thousands), this is going to be
> difficult or essentially impossible. This aside, the model allows for no
> heterogeneity when there are multiple correlations for the same var1var2 pair. A
> simple way to allow for this is to add another estimate specific random effect
> to the model:
>
> dat$id <- 1:nrow(dat)
> http://rma.mv(yi, V, mods = ~ 0 + var1var2, random = list(~ var1var2 | study, ~
> 1 | id), struct="UN", data=dat)
>
> This is simplistic, since it assumes that the heterogeneity in multiple
> correlations for the same pair is the same regardless of the pair. If you have a
> lot of data, one could try:
>
> http://rma.mv(yi, V, mods = ~ 0 + var1var2, random = list(~ var1var2 | study, ~
> var1var2 | id), struct=c("UN","DIAG"), data=dat)
>
> which would use separate estimate-level random effects for each pair, but this
> adds another 28 parameters to the model. But who cares about another 28 if one
> already has 406 ...
>
> Realistically, one needs to simplify the random effects structure. On the
> opposite end, there is the minimalistic:
>
> res <- http://rma.mv(yi, V, mods = ~ 0 + var1var2, random = ~ 1 | study/id,
> data=dat)
> res
>
> which, due to its overly simplistic nature, really needs to be followed-up with:
>
> robust(res, cluster=study, clubSandwich=TRUE)
>
> (could do the same with the models above, but this is less likely to matter if
> one actually manages to fit these complex models).
>
> An interesting question is what kind of structures of intermediate complexity
> one could consider.
>
> But I'll stop here for now, since this is getting way too long anyway.
>
> Best,
> Wolfgang
>
> > -----Original Message-----
> > From: R-sig-meta-analysis <mailto:r-sig-meta-analysis-bounces using r-project.org>
> On Behalf
> > Of Yuhang Hu via R-sig-meta-analysis
> > Sent: Wednesday, December 13, 2023 06:21
> > To: R meta <mailto:r-sig-meta-analysis using r-project.org>
> > Cc: Yuhang Hu <mailto:yh342 using nau.edu>
> > Subject: [R-meta] Coding multi-measure correlational studies for multilevel
> > meta-analysis
> >
> > Hello Experts,
> >
> > I'm collecting the correlations between 8 variables from several studies.
> > If a study has used a single measure for all these 8 variables, I will need
> > 28 rows (assuming no missing) to capture all those correlations i.e.,
> > var1.var2 = combn(1:8, 2, FUN=\(i)paste(i,collapse = ".")):
> >
> >     study   ri   var1.var2
> > 1)  1        .1   1.2
> >  ...
> > 28) 1       .2    7.8
> >
> > But if a study has used, say, two measures (e.g., 1, 2) for two of those 8
> > variables (e.g., variables "1" and "2" in 'var1.var2'), then, I wonder how
> > **best** to capture the additional 13 correlations arising due to the
> > additional measure used for "1" and "2" in that study in my data for
> > multilevel modeling purposes?
> >
> > One approach might be to add a single column called, say "measure" to add
> > just those additional rows in that multi-measure study:
> >
> >     study   ri   var1.var2  measure
> > 1)   1       .1    1.2
> >  ...
> > 6)   1       .6     1.7           1
> > 7)   1       .4     1.7           2
> > ...
> > 12)  1       .8     2.7          1
> > 13)  1       .7     2.7          2
> > ...
> >
> > But this looks messy. For instance, what should be the value of "measure"
> > for the var1.var2 rows that have used a single measure (e.g., var1.var2 ==
> > 1.2)? And can "measure" coded this way be used in the random part of the
> > model (metafor::http://rma.mv)?
> >
> > Thanks,
> > Yuhang