# [R-meta] Construct the covariance-matrices of different effect sizes

James Pustejovsky jepu@to @end|ng |rom gm@||@com
Wed Jan 20 17:02:02 CET 2021

```Hi Tzlil,

(1) For the pearson correlations and ICCs, I think the data might need to
include some sort of label that distinguishes unique measures and
time-points. So for example, submax-T1, submax-T2, max-T1, max-T2. Also, if
you could provide a reproducible example with data that someone could
easily read in to R (using e.g., dput()), it will increase the probability
that others will chime in with coding/debugging suggestions. There are some
links to how to do this here:
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis

(2) Yes, impute_covariance_matrix() works just fine with raw mean
differences. The correlation you input into the function corresponds
exactly to the correlation between outcomes.

(3) Thanks for linking to the earlier post. The formula Wolfgang gave is
probably derived by the delta method.

Just to illustrate how using these formulas would work, if assume a
correlation between outcomes of r = 0.7, then that means that the
correlation between raw mean differences estimates should also be 0.7, but
that the correlation between SDLN estimates should be r^2 = 0.49. And the
correlation between ICCs would be something else yet again, according to
the formulas implemented in rcalc(). One implication of this is that if you
want to conduct sensitivity analyses from r = 0.0 to r = 0.9, then the
sensitivity analyses for the raw mean difference estimates should be over
the range r = 0.0 to r = 0.9, but the sensitivity for SDLN values should
run r = 0.0, r = 0.01, r = 0.04, r = 0.09,..., r = 0.81.

James

On Tue, Jan 19, 2021 at 10:02 PM Tzlil Shushan <tzlil21092 using gmail.com> wrote:

> Hi James,
>
> Thank you so much for the response and further explanations around the
> merits associated with V-matrices in correlated datasets. I think that both
> you and Wolfgang have already provided a thorough insight on this in
> previous threads in the mailing list and papers as this recent one
> https://osf.io/x8yre/
> However, I struggle to be confident with the formulas I use for each
> effect index, and since my math background isn't wide, I found it hard to
> extract them from papers and I'll be super grateful to get your help here :)
>
> Since I don't have enough information on the correlation between outcomes,
> I'm using 'guesstimate' values and back it up with sensitivity analysis (at
> first glance, with my current formulas for V-matrices, values range from 0
> to 0.9 correlation did not yield changes in the results).
>
> (1) I might start with my Pearson correlation and ICC, because they are
> probably the tricky ones, and up to now I don't really have something to
> support on besides impute_covariance_matrix formula. For both datasets I
> used r to Fisher's z transformation to approximate normal distributed
> datasets.
>
> ICC is simply the ICC between test-retest (submaximal heart rate values).
> ICCs from the same sample can be derived from the number of test-retests
> (i.e. more than one paired tests), different intensities, different test
> duration etc., which I investigate  separately as meta-regression. However,
> I'm first interested in the average effect.
>
> Pearson correlations are derived from the association between heart rate
> at submaximal intensity(ies) and criterion measure of aerobic fitness
> (let's say max test). In theory, we expect an inverse relationship–lower
> heart rate at submaximal intensity associated with higher result in max
> test. Similarly to ICC, these effects are obtained from different
> (submaximal) intensities, time-points across the season etc. I want to
> extract the average effect, and then continue to moderator analyses.
>
> ## these are datasets examples
>
> es.id study.id  n  icc     yi     vi
> 1     1         25 0.90 1.4722 0.0455
> 2     1         25 0.84 1.2212 0.0455
> 3     2         16 0.72 0.9076 0.0769
> 4     2         16 0.85 1.2562 0.0769
> 5     2         16 0.83 1.1881 0.0769
> 6     3         38 0.92 1.5890 0.0286
>
> es.id study.id  n     r      yi           vi
> 1           1       25 -0.61 -0.7089 0.0455
> 2           1       25 -0.58 -0.6625 0.0455
> 3           2       16 -0.58 -0.6625 0.0769
> 4           2       16 -0.60 -0.6931 0.0769
> 5           2       16 -0.56 -0.6328 0.0769
> 6           3       38 -0.55 -0.6184 0.0286
>
> I tried to implement rcalc() with my dataset, but the thing is that I have
> two variables that are constant (submax and submax tests for ICC and submax
> and max tests for *r*) that were examined repeatedly within samples,
> while the formula requires different paired variables within sample/studies
> as the example here
> https://wviechtb.github.io/metafor/reference/rcalc.html Is it right? I'm
> quite lost here!
>
> (2) As you mentioned, I construct the correlation between raw mean
> difference using impute_covariance_matrix(). However, as I understand, it
> is more useful to use with standardised mean difference effects? The same
> is true for raw mean difference?
>
> (3) Yes, my effect size index for SDs is the the log transformed SD (plus
> bias correction for sample size) using the method proposed by Nakagawa
> et.al 2015
>
> https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12309 ("SDLN"
> in escal function) whereby the sampling variance is constructed solely
> based on sample size 1/2(n-1). To construct the V-matrix I used Wolfgang
> response in the mailing list regarding a similar question (see below)
>  https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2019-July/001630.html
> Is this the right (or a fair) way to approach this? Does the Delta method
> work better here? If so, can you refer me to something that I can work with.
>
> ## this is a small example of my dataset
>
> es.id study.id  n   SD     yi     vi
> 1         1         25 1.45 0.3924 0.0208
> 2         1         25 1.41 0.3644 0.0208
> 3         2         16 1.05 0.0821 0.0333
> 4         2         16 1.20 0.2157 0.0333
> 5         2         16 1.27 0.2724 0.0333
> 6         3          38 1.24 0.2286 0.0135
>
> I'll be grateful to get any insight from you and others on the mailing
> list..
>
> Thanks and Kind regards,
>
> Tzlil Shushan | Sport Scientist, Physical Preparation Coach
>
> BEd Physical Education and Exercise Science
> MSc Exercise Science - High Performance Sports: Strength &
> Conditioning, CSCS
> PhD Candidate Human Performance Science & Sports Analytics
>
>
>
> ‫בתאריך יום ד׳, 20 בינו׳ 2021 ב-1:53 מאת ‪James Pustejovsky‬‏ <‪
> jepusto using gmail.com‬‏>:‬
>
>> Hi Tzlil,
>>
>> Apologies for the long delay in responding to your query. You've raised
>> some excellent questions about rather subtle issues. To your question (1),
>> I would say that it is NOT compulsory to use a sampling variance-covariance
>> matrix (the "V-matrix") for the effect sizes of each type. Omitting the
>> V-matrix amounts to assuming a correlation of zero. If your goal is
>> primarily to understand average effect sizes (of each type), then using
>> robust standard errors/hypothesis tests/confidence intervals will work even
>> if you have a mis-specified assumption about the correlation among effect
>> sizes.
>>
>> That said, there are at least two potential benefits to using a V-matrix
>> based on more plausible assumptions about the correlation between effect
>> sizes. First, using a working model that is closer to the true dependence
>> structure will yield a more precise estimate of the average effect. If
>> you're just estimating an average effect of each type, the gain in
>> precision will probably be pretty small. If you're estimating a
>> meta-regression with a more complex set of predictors, the gains can be
>> more substantial.
>>
>> The second potential benefit is that using a plausible V-matrix will give
>> you better, more defensible estimates of the variance components
>> (between-study variance and within-study variance). Whether based on REML
>> or some other estimation method, the variance component estimates are NOT
>> robust to mistaken assumptions about the sampling correlation structure.
>> They'll be biased unless you have the sampling correlation structure
>> approximately correct. So to the extent that understanding heterogeneity is
>> important, I think it's worth working on building in a V-matrix.
>>
>> To your question (2), I like the approach you've outlined, where you use
>> different V-matrices for each of the effect indices you're looking at. I
>> think ideally, you would start by making a single assumption about the
>> degree of correlation between the *outcomes*, and then using that to
>> derive the appropriate degree of correlation between each of the indices:
>> * For raw mean differences, the correlation between outcomes will
>> translate directly into the correlation between mean differences.
>> * For SDs, I'm not sure exactly what your ES index is. Is it the log
>> ratio of SDs? How did you arrive at the formula for the correlation between
>> effect sizes? I don't know of a source for this, but it could be derived
>> via the delta method.
>> * For ICCs and pearson correlations, using Wolfgang's function would be
>> the way to go. Perhaps if you can provide a small example of your data and
>> syntax that you've attempted, folks on the list can provide guidance about
>> applying the function.
>>
>> Kind Regards,
>> James
>>
>> On Thu, Jan 7, 2021 at 5:16 PM Tzlil Shushan <tzlil21092 using gmail.com>
>> wrote:
>>
>>> Dear Wolfgang and James,
>>>
>>> Apologise for the long assay in advance..
>>>
>>> In my meta-analysis I obtained different effect sizes coming
>>> from test-retest and correlational designs. Accordingly, I performed 4
>>> different meta-analyses for each effect size:
>>>
>>> Raw mean difference of test-retest
>>> Standard deviation (using Nakagawa et al. 2015 approach) of test-retest
>>> Intraclass correlation (transformed to z fisher values) of test-retest
>>> Pearson correlation coefficient (transformed to z fisher values) derived
>>> from the same test against criterion measure.
>>>
>>> Because many studies meeting inclusion criteria provided more than one
>>> effect size through various ways of repeated measures (for example,
>>> multiple intensities of the test, repeated measures across the year), which
>>> all based on a common sample of participants, I treated each unique sample
>>> as an independent study (NOTE: this approach serves our purposes on the
>>> best way and adding further level will results in low number of
>>> clusters–which I don't want, given the use of RVE).
>>>
>>> Thanks to the great discussions in this group, we've done the following:
>>> (1) used rma.mv() to estimate the overall average estimate and the
>>> variance in hierarchical working model. The same for meta-regressions we
>>> performed.
>>>
>>> (2) Compute robust variance estimates with robust() and coef_test()
>>> functions, clustering at the level of studies (the same is true for both
>>> overall models and meta-regression).
>>>
>>> However, after reading some threads in the groups in the last weeks
>>>
>>> https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2021-January/002565.html
>>>
>>> https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2018-February/000647.html
>>> and more...I think that one step further is to
>>> provide variance-covariance matrices for each meta-analysis before step 1
>>> and 2 noted above.
>>>
>>> In this regard I have some other questions:
>>>
>>> (1) Is it compulsory to create (an estimate) variance-covariance given
>>> the structure of my dataset?
>>>
>>> (2) IF YES, I'm not sure if I can use the same covariance formulas for
>>> all effect sizes. For example, impute_covariance_matrix() from clubSandwich
>>> can work fine with all effect sizes (mean diff, SD, icc etc.)? or should I
>>> estimate the covariance-matrix with a unique function for each effect size?
>>>
>>> Based on reading and suggestions:
>>> • I used impute_covariance_matrix() for mean difference.
>>>
>>> • For standard deviation I constructed the formula below:
>>> calc.v <- function(x) {
>>> v <- matrix(r^2/(2*x\$ni-1), nrow=nrow(x), ncol=nrow(x))
>>> diag(v) <- x\$vi
>>> v
>>> }
>>> V <- bldiag(lapply(split(dat, dat\$study), calc.v))
>>> http://www.metafor-project.org/doku.php/analyses:gleser2009
>>>
>>> • for icc and pearson correlation I've looked at this
>>> https://wviechtb.github.io/metafor/reference/rcalc.html but I couldn't
>>> create something which is appropriate to my dataset (I don't really know
>>> how to specify var1 and var2).
>>>
>>> With this regard, I created a sensitivity analysis (with 0.3, 0.5, 0.7
>>> and 0.9) which revealed similar overall estimates (also similar to the
>>> working models without covariance-matrix), albeit, changed a bit the
>>> magnitude of sigma2.1 and sigma2.2
>>>
>>> I'll be thankful to get any thoughts..
>>>
>>> Kind regards and thanks in advance!
>>>
>>> Tzlil
>>>
>>> Tzlil Shushan | Sport Scientist, Physical Preparation Coach
>>>
>>> BEd Physical Education and Exercise Science
>>> MSc Exercise Science - High Performance Sports: Strength &
>>> Conditioning, CSCS
>>> PhD Candidate Human Performance Science & Sports Analytics
>>>
>>>

[[alternative HTML version deleted]]

```