[R-meta] Question about Meta analysis

Mon May 6 17:08:31 CEST 2024

Dear Wolfgang,

as always many thanks!

> In study 3, there is just a single row. Just to be clear: You are referring to a 'test-retest r = 0.9' but this has no bearing on the sampling variance in V. If it is a within-study design, the computation of its sampling variance already should have been done in such a way that any pre-post correlation is accounted for.

I see, that clears things up.

> I am trying to understand your coding for study 4 ("Within-study with one control and two intervention conditions"), which you coded as follows:
>   studyid esid design subgroup type time1 time2 grp1 grp2  ne nc  yi   vi
> 5        4    5      2        1    1     1     2    e    c  40 40 0.5 0.05
> 6        4    6      2        1    1     1     3    e    c  40 40 0.6 0.05
> But this coding implies that there are two independent groups, e and c, where e was measured at time point 1 and c at time points 2 and 3. I am not sure if I really understand this design.

I guess in that case I just mis-specified. If it is a pure within-design (always same subjects in every condition) then the coding for both grp1 and grp2 is supposed to always have the same value (so „e" for each cell)? Seems like I got that wrong, thanks for making me aware.

> For study 6, your coding is:
>   studyid esid design subgroup type time1 time2 grp1 grp2  ne nc  yi   vi
> 11       6   11      2        1    1     1     2    c    c  90 90 1.1 0.05
> 12       6   12      2        1    2     1     2    c    c  90 90 1.2 0.05
> 
> But I think the coding should be:
>   studyid esid design subgroup type time1 time2 grp1 grp2  ne nc  yi   vi
> 11       6   11      2        1    1     1     2    e    e  90 90 1.1 0.05
> 12       6   12      2        1    2     1     2    e    e  90 90 1.2 0.05

Makes sense.

> In study 5, there are two subgroups. Since there is (presumably) no overlap of subjects across subgroups, the sampling errors across subgroups are independent, so we just have two cases of what we have in study 2.

Agreed, and by specifying  "random = ~ 1 | studyid/esid“  in my model the dependency of the effect sizes from that study should be taken care of.

> I recently added measure "SMCRP" to escalc(). This uses the pooled SD from pre and post to standardize the difference.
Great! Just out of curiosity is it based on the approach by Cousineau, 2020? doi: 10.20982/tqmp.16.4.p418

Thanks a lot for taking the time and your help!

Best,
Max

> Am 06.05.2024 um 13:53 schrieb Viechtbauer, Wolfgang (NP) <wolfgang.viechtbauer using maastrichtuniversity.nl>:
> 
> Dear Max,
> 
> Please see below for my responses.
> 
> Best,
> Wolfgang
> 
>> -----Original Message-----
>> From: Maximilian Steininger <maximilian.steininger using univie.ac.at>
>> Sent: Friday, April 26, 2024 16:05
>> To: R Special Interest Group for Meta-Analysis <r-sig-meta-analysis using r-
>> project.org>; Viechtbauer, Wolfgang (NP)
>> <wolfgang.viechtbauer using maastrichtuniversity.nl>
>> Subject: Re: [R-meta] Question about Meta analysis
>> 
>> Dear Wolfgang,
>> 
>> Thank you very much for your detailed reply. I also wanted to take the
>> opportunity to thank you for your extremely well-documented resources. They are
>> enormously helpful.
> 
> Thanks for the kind feedback.
> 
>> Indeed, creating the V matrix is not trivial, but your examples are a great
>> guide. If it's not too much to ask (if it is, nevermind!) I would appreciate
>> feedback on my approach. Here is an exemplary structure from my data, which
>> captures all of the dependencies that I have in my real dataset.
>> 
>> Study 1: Between-study with one intervention and one control group.
>> Study 2: Between-study with two interventions and one control group.
>> Study 3: Within-study with one control and one intervention condition (assumed
>> test-retest r = 0.9).
>> Study 4: Within-study with one control and two intervention conditions (assumed
>> test-retest r = 0.9).
>> Study 5: Between-study with two experiments, each containing two interventions
>> and one control group.
>> Study 6: Within-study in which two different (low correlation, r= .1) dependent
>> variables were used (assumed test-retest r = 0.9).
>> 
>> This is the (made-up) data set:
>> 
>> studyid = c(1,rep(2,2),3,rep(4,2),rep(5,4),rep(6,2))
>> esid = c(1:12)
>> design = c(rep(1,3), rep(2,3), rep(1,4), rep(2,2))
>> subgroup = c(rep(1,8), rep(2,2), rep(1,2))
>> type = c(rep(1,11),2)
>> time1 = rep(1,12)
>> time2 = c(rep(1,3), rep(2,2), 3, rep(1,4), rep(2,2))
>> grp1 = c("e","e1","e2","e","e","e","e1","e2","e1","e2","e","e")
>> grp2 = rep("c",12)
>> ne = c(10,15,20,30,40,40,45,50,80,100,90,90)
>> nc = c(11,16,16,30,40,40,46,46,61,61,90,90)
>> yi = seq(0.1, 1.2, by = 0.1)
>> vi = rep(0.05, 12)
>> dat = cbind.data.frame(studyid, esid, design, subgroup, type, time1, time2,
>> grp1, grp2, ne, nc, yi, vi)
>> 
>> This would be my V matrix:
>> 
>> V = vcalc(vi=vi, cluster = studyid, subgroup = subgroup, type = type, time1 =
>> time1, time2 = time2, grp1 = grp1, grp2 = grp2, w1 = nc, w2 = ne, rho = 0.1, phi
>> = 0.9, data = dat)
> 
> Thanks for the reproducible example. I had a look:
> 
> blsplit(V, dat$studyid)
> blsplit(V, dat$studyid, cov2cor)
> 
> So, study 1 is just a single row and its sampling variance is as given (0.05).
> 
> In study 2 the correlation between the two effects should be around 0.5ish (it would be exactly 0.5 if you had not specified w1 and w2) due to a shared control group.
> 
> In study 3, there is just a single row. Just to be clear: You are referring to a 'test-retest r = 0.9' but this has no bearing on the sampling variance in V. If it is a within-study design, the computation of its sampling variance already should have been done in such a way that any pre-post correlation is accounted for.
> 
> I am trying to understand your coding for study 4 ("Within-study with one control and two intervention conditions"), which you coded as follows:
> 
>   studyid esid design subgroup type time1 time2 grp1 grp2  ne nc  yi   vi
> 5        4    5      2        1    1     1     2    e    c  40 40 0.5 0.05
> 6        4    6      2        1    1     1     3    e    c  40 40 0.6 0.05
> 
> But this coding implies that there are two independent groups, e and c, where e was measured at time point 1 and c at time points 2 and 3. I am not sure if I really understand this design.
> 
> In study 5, there are two subgroups. Since there is (presumably) no overlap of subjects across subgroups, the sampling errors across subgroups are independent, so we just have two cases of what we have in study 2.
> 
> For study 6, your coding is:
> 
>   studyid esid design subgroup type time1 time2 grp1 grp2  ne nc  yi   vi
> 11       6   11      2        1    1     1     2    c    c  90 90 1.1 0.05
> 12       6   12      2        1    2     1     2    c    c  90 90 1.2 0.05
> 
> But I think the coding should be:
> 
>   studyid esid design subgroup type time1 time2 grp1 grp2  ne nc  yi   vi
> 11       6   11      2        1    1     1     2    e    e  90 90 1.1 0.05
> 12       6   12      2        1    2     1     2    e    e  90 90 1.2 0.05
> 
> although this makes no difference to V. Note that r = 0.9 is again irrelevant here, but for a different reason since it happens to cancel out in the computation of the covariance.
> 
>> And this is how I would specify the meta-analytic model:
>> 
>> res <- rma.mv(yi, V, random = ~ 1 | studyid/esid, data=dat)
>> 
>> What still puzzles me about the V matrix is why no dependence for study 5
>> between experiment 1 and experiment 2 is modeled (which might be unnecessary
>> because this is taken care of by the random effect structure of three-level
>> model?) and why the correlation of the effects in study 4 is 0.95 and not equal
>> to my specification of phi = 0.9.
> 
> See above.
> 
>> I also have a follow-up question regarding the use of SMD with raw score
>> standardization. Calculating SMD for within-designs based on the raw-score
>> metric, as suggested by Becker (1988), induces the problem that most within-
>> studies use a counterbalanced design, and therefore there is no clear SDpre. Can
>> this be ignored, or how should one best deal with it?
> 
> I recently added measure "SMCRP" to escalc(). This uses the pooled SD from pre and post to standardize the difference.
> 
>> Thank you very much for your support. If it is too tedious to answer all my
>> questions, please just ignore them.
>> 
>> Best,
>> Max
>> 
>>> Am 23.04.2024 um 11:11 schrieb Viechtbauer, Wolfgang (NP)
>> <wolfgang.viechtbauer using maastrichtuniversity.nl>:
>>> 
>>> Ah, now I get it. Then let me answer your other post here and maybe this will
>> be of use to all.
>>> 
>>> As noted in my answer to Sevilay, this part of the metafor documentation is
>> relevant:
>>> 
>>> https://wviechtb.github.io/metafor/reference/misc-recs.html#general-workflow-
>> for-meta-analyses-involving-complex-dependency-structures
>>> 
>>> This is in essence your Q1, and yes, this is good practice. Not sure if this
>> is 'best' practice. In general, how such complex cases should be handled depends
>> on many factors.
>>> 
>>> Not sure what distinction you are making between this approach and the use of
>> multivariate meta-analysis (combined with RVE), since the three-level model can
>> also be seen as a multivariate meta-analysis, as discussed in these examples:
>>> 
>>> https://www.metafor-project.org/doku.php/analyses:konstantopoulos2011
>>> https://www.metafor-project.org/doku.php/analyses:crede2010
>>> 
>>> A major challenge in cases where there is sampling error dependency is the
>> construction of the V matrix. Many will not even attempt this and will rely on /
>> hope that RVE fixes up the standard errors of the fixed effects. Roughly, this
>> is at least asymptotically true as long as the cluster variable used in RVE
>> encompasses estimates that are potentially dependent (due to whatever reason).
>> In principle, the vcalc() function can handle quite a number of different types
>> of dependencies for constructing the V matrix, but I even struggle at times
>> trying to make it fit to a particular case. For example, this example shows this
>> so some extent:
>>> 
>>> https://wviechtb.github.io/metadat/reference/dat.knapp2017.html
>>> 
>>> The other challenge is the choice of the random effects. Often, people just
>> use a 'simple' three-level model, but more complex structures are certainly
>> possible and may provide a better reflection of the depedency structure. An
>> example where we did not use a V matrix (which would have been hopelessly
>> complex) but used a more complex random effects structure is this:
>>> 
>>> https://wviechtb.github.io/metadat/reference/dat.mccurdy2020.html
>>> 
>>> With respect to your other questions:
>>> 
>>> Q2) Yes, I would say the test-retest reliability can be a decent proxy for
>> estimating the correlation between estimates that are obtained at multiple time
>> points (assuming that the time lags are similar).
>>> 
>>> Q3) As you note, the pre-post correlation is needed to correctly compute the
>> sampling variance of a standardized mean change (with raw score
>> standardization). That's a different issue than using a correlation coefficient
>> to account for the dependency between two such effect sizes. So no, you are not
>> being overly conservatively in doing so.
>>> 
>>> Q4) You do not need to 'correct' the control / common comparator group sample
>> size when you account for the dependency via their covariance in the V matrix.
>>> 
>>> Q5) Hard to say without digging into the details of your data. But again, the
>> three-level model *is* already a particular type of multivariate model. This
>> aside, yes, these two ideas -- that there are multiple levels plus multiple
>> types of outcomes -- can certainly be combined.
>>> 
>>> In general, I would say you are asking the right questions and are on the
>> right track, but it is hard to say more without further details.
>>> 
>>> Best,
>>> Wolfgang
>>> 
>>>> -----Original Message-----
>>>> From: Maximilian Steininger <maximilian.steininger using univie.ac.at>
>>>> Sent: Tuesday, April 23, 2024 10:16
>>>> To: R Special Interest Group for Meta-Analysis <r-sig-meta-analysis using r-
>>>> project.org>
>>>> Cc: Viechtbauer, Wolfgang (NP) <wolfgang.viechtbauer using maastrichtuniversity.nl>
>>>> Subject: Re: [R-meta] Question about Meta analysis
>>>> 
>>>> Dear Wolfgang, dear Selivay,
>>>> 
>>>> I think Selivay was referring to my longer message from a few days ago (see
>>>> below). However, as I am only just starting to familiarise myself with the
>>>> method, I am unfortunately unable to provide Selivay with any
>> conclusive/helpful
>>>> answers.
>>>> 
>>>> I had hoped that my open questions from back then might still be answered,
>> but
>>>> perhaps they are too obvious or uninformed (or simply too long) and can be
>>>> answered with more literature research by myself.
>>>> 
>>>> Many thanks in any case for the link Wolfgang.
>>>> 
>>>> @Selivay: You can write me a direct message via
>>>> maximilian.steininger using univie.ac.at , then I can share you a detailed list of
>> all
>>>> the resources I used.
>>>> 
>>>> Best,
>>>> Max
>>>> 
>>>>> Am 16.04.2024 um 17:47 schrieb Maximilian Steininger via R-sig-meta-analysis
>>>> <r-sig-meta-analysis using r-project.org>:
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> First of all, thank you for this mailing list and the work that has gone
>> into
>>>> the responses and the materials linked so far.
>>>>> 
>>>>> I have tried to use the previous answers to solve my specific problem, but I
>>>> am unsure if my conclusion is correct and appropriate and would appreciate
>>>> further feedback.
>>>>> 
>>>>> I am a PhD student – so relatively unexperienced – currently running a
>>>> systematic review and meta-analysis for the first time. My meta-analysis
>>>> includes several studies (60 studies; with overall 99 effects), that all use
>> the
>>>> same dependent variable, but that have different designs and thus different
>>>> forms of dependencies. I have three types of studies:
>>>>> 
>>>>> a) Between-participant designs comparing one (or more) intervention group to
>> a
>>>> control group.
>>>>> 
>>>>> b) Within-participant designs comparing one (or more) condition to a control
>>>> condition.
>>>>> 
>>>>> c) Pre-Post control group designs comparing one (or more) intervention group
>>>> (tested pre- and post-intervention) to a control group (also tested pre- and
>>>> post-control).
>>>>> 
>>>>> As indicated above, there are studies that report more than one effect.
>> Hence,
>>>> there is effect-size dependency and/or sampling error dependency. Some
>> studies
>>>> have multiple intervention groups, some studies have multiple comparison
>> groups
>>>> and the within studies (b) have “multiple follow-up times” meaning that each
>>>> participant is tested multiple times on the same outcome. I am a bit confused
>> on
>>>> how to best model these dependencies, since I came across several approaches.
>>>>> 
>>>>> Initially I wanted to run a multilevel (three-level) meta-analysis with
>>>> participants (level 1) nested within outcomes (level 2) nested within studies
>>>> (level 3). However, reading through the archives of this group I figured that
>>>> this model does not appropriately deal with sampling error dependency.
>>>>> 
>>>>> To deal with this I came across the solution to construct a "working"
>>>> variance-covariance matrix and input it into my three-level meta-analysis
>> model
>>>> (using e.g. this approach https://www.jepusto.com/imputing-covariance-
>> matrices-
>>>> for-multi-variate-meta-analysis/<https://www.jepusto.com/imputing-covariance-
>>>> matrices-for-multi-variate-meta-analysis/>). Then I would fit this “working
>>>> model” using metafor and feed it into the clubSadwich package to perform
>> robust
>>>> variance estimation (RVE). Of course I would conduct sensitivity analysis to
>>>> check whether feeding different dependencies (i.e. correlation coefficients)
>>>> into my variance-covariance matrix makes a difference. Q1) Is this the “best”
>>>> approach to deal with my dependencies?
>>>>> 
>>>>> Alternatively, I came across the approach to use multivariate meta-analysis,
>>>> again coupled with constructing a “working” variance-covariance matrix.
>> However,
>>>> I am unsure whether this makes sense because I don’t have multiple dependent
>>>> variables.
>>>>> 
>>>>> Furthermore, I have a couple of questions regarding my dependencies:
>>>>> 
>>>>> Q2) To calculate a “guestimate” for the variance-covariance matrix I need a
>>>> correlation coefficient. As (almost) always none is provided in the original
>>>> studies. Would it be a plausible approach to use the test-retest reliability
>> of
>>>> my dependent variable (which is reported in a number of other studies not
>>>> included in the analysis) to guess the correlation?
>>>>> 
>>>>> Q3) For my meta-analysis I use the yi and vi values (e.g. effect sizes and
>>>> their variance). I calculate these beforehand using the descriptive stats of
>> my
>>>> studies and formulas suggested by Morris & DeShon (2002). For my effect sizes
>> of
>>>> the within- (b) as well as pre-post control group designs (c), I already use
>> the
>>>> test-retest reliability of the dependent variable to estimate the variances
>> of
>>>> these effect sizes. If I now use these “corrected” effect size variances and
>> run
>>>> the model, would I use this same correlation to compute my variance-
>> covariance
>>>> matrix? Am I not, overly conservatively, “controlling” for this dependency
>> then
>>>> twice (once in the estimation of the individual variance of the effect sizes
>> and
>>>> once in the model)?
>>>>> 
>>>>> Q4) For between-studies it is suggested to correct the sample size of the
>>>> control group (by number of comparisons) if it is compared more than once to
>> an
>>>> intervention. Do I also have to do this if I calculate a variance-covariance
>>>> matrix (which should take care of these dependencies already)? Is it enough
>> to
>>>> calculate the variance-covariance matrix and then use a multilevel or
>>>> multivariate approach? If it is not enough, do I also have to correct the
>> sample
>>>> size for within-participant designs (b) as well (e.g., all participants
>> undergo
>>>> all conditions, so I must correct N by dividing overall sample size by number
>> of
>>>> conditions)?
>>>>> 
>>>>> Q5) Can I combine multivariate and multilevel models with each other and
>> would
>>>> that be appropriate in my case?
>>>>> 
>>>>> Or is all of this utter nonsense and a completely different approach would
>> be
>>>> the best way to go?
>>>>> 
>>>>> Thank you very much for your time and kindness in helping a newcomer to the
>>>> method.
>>>>> 
>>>>> Best and many thanks,
>>>>> Max