[R-meta] Construct the covariance-matrices of different effect sizes

Thu Jan 21 00:39:14 CET 2021

Hi James,

That's great!

Here is a small example using dput() (I hope that is what you meant to). I
also added two of my moderators that if someone can provide the syntax for
the overall average effect (without moderators), and also with a moderator.

dat <- structure(list(es.id = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 2L), .Label = c("1", "10", "2", "3", "4", "5", "6",
"7", "8", "9"), class = "factor"), study.id = structure(c(1L, 1L, 2L, 2L,
2L, 3L, 4L, 4L, 5L, 5L), .Label = c("1", "2", "3",
"4", "5"), class = "factor"), n = c(25, 25, 16, 16, 16, 38, 28, 28, 14,
14), r = c(-0.61, -0.58, -0.58, -0.6, -0.56, -0.55, -0.5,
-0.5, -0.48, -0.42), mean.age = c(23, 23, 18.7, 18.7, 18.7, 25, 22.5, 22.5,
20.1, 20.1), comp.level = structure(c(1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("E", "SE"), class = "factor"), var1
= c("submax-T1", "submax-T2", "submax-T1", "submax-T2",
   "submax-T3", "submax-T1", "submax-T1", "submax-T2", "submax-T1",
"submax-T2"), var2 = c("max-T1", "max-T2", "max-T1", "max-T2", "max-T3",
"max-T1", "max-T1", "max-T2", "max-T1", "max-T2"), yi =
structure(c(-0.708921359427408, -0.662462707371799, -0.662462707371799,
-0.693147180559945, -0.632833186665638,
-0.618381313574463, -0.549306144334055, -0.549306144334055,
-0.522984277591344, -0.447692023527421), ni = c(25, 25, 16, 16, 16, 38, 28,
28, 14, 14), measure = "ZCOR"), vi = c(0.0454545454545455,
0.0454545454545455, 0.0769230769230769, 0.0769230769230769,
0.0769230769230769, 0.0285714285714286, 0.04, 0.04, 0.0909090909090909,
0.0909090909090909)), row.names = c(NA, -10L), class = c("escalc",
"data.frame"), digits = c(est = 4, se = 4, test = 4, pval = 4, ci = 4, var
= 4, sevar = 4, fit = 4, het = 4), yi.names = "yi", vi.names = "vi")

tmp <- rcalc(r ~ var1 + var2 | study.id, ni=n, data = dat)
V <- tmp$V
dat <- tmp$dat
dat[dat$study.id == 1,]

I've tried to follow the description here
https://wviechtb.github.io/metafor/reference/rcalc.html but got an error.

Happy to hear that for mean difference and SDs I'm on the right path! I
hope to be at the same position with my Pearson correlation and ICC data :)

Thanks James!

Kind regards,

Tzlil Shushan | Sport Scientist, Physical Preparation Coach

BEd Physical Education and Exercise Science
MSc Exercise Science - High Performance Sports: Strength &
Conditioning, CSCS
PhD Candidate Human Performance Science & Sports Analytics

‫בתאריך יום ה׳, 21 בינו׳ 2021 ב-3:02 מאת ‪James Pustejovsky‬‏ <‪
jepusto using gmail.com‬‏>:‬

> Hi Tzlil,
>
> Some responses to your follow-ups:
>
> (1) For the pearson correlations and ICCs, I think the data might need to
> include some sort of label that distinguishes unique measures and
> time-points. So for example, submax-T1, submax-T2, max-T1, max-T2. Also, if
> you could provide a reproducible example with data that someone could
> easily read in to R (using e.g., dput()), it will increase the probability
> that others will chime in with coding/debugging suggestions. There are some
> links to how to do this here:
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>
> (2) Yes, impute_covariance_matrix() works just fine with raw mean
> differences. The correlation you input into the function corresponds
> exactly to the correlation between outcomes.
>
> (3) Thanks for linking to the earlier post. The formula Wolfgang gave is
> probably derived by the delta method.
>
> Just to illustrate how using these formulas would work, if assume a
> correlation between outcomes of r = 0.7, then that means that the
> correlation between raw mean differences estimates should also be 0.7, but
> that the correlation between SDLN estimates should be r^2 = 0.49. And the
> correlation between ICCs would be something else yet again, according to
> the formulas implemented in rcalc(). One implication of this is that if you
> want to conduct sensitivity analyses from r = 0.0 to r = 0.9, then the
> sensitivity analyses for the raw mean difference estimates should be over
> the range r = 0.0 to r = 0.9, but the sensitivity for SDLN values should
> run r = 0.0, r = 0.01, r = 0.04, r = 0.09,..., r = 0.81.
>
> James
>
>
>
> On Tue, Jan 19, 2021 at 10:02 PM Tzlil Shushan <tzlil21092 using gmail.com>
> wrote:
>
>> Hi James,
>>
>> Thank you so much for the response and further explanations around the
>> merits associated with V-matrices in correlated datasets. I think that both
>> you and Wolfgang have already provided a thorough insight on this in
>> previous threads in the mailing list and papers as this recent one
>> https://osf.io/x8yre/
>> However, I struggle to be confident with the formulas I use for each
>> effect index, and since my math background isn't wide, I found it hard to
>> extract them from papers and I'll be super grateful to get your help here :)
>>
>> Since I don't have enough information on the correlation between
>> outcomes, I'm using 'guesstimate' values and back it up with sensitivity
>> analysis (at first glance, with my current formulas for V-matrices, values
>> range from 0 to 0.9 correlation did not yield changes in the results).
>>
>> (1) I might start with my Pearson correlation and ICC, because they are
>> probably the tricky ones, and up to now I don't really have something to
>> support on besides impute_covariance_matrix formula. For both datasets I
>> used r to Fisher's z transformation to approximate normal distributed
>> datasets.
>>
>> ICC is simply the ICC between test-retest (submaximal heart rate values).
>> ICCs from the same sample can be derived from the number of test-retests
>> (i.e. more than one paired tests), different intensities, different test
>> duration etc., which I investigate  separately as meta-regression. However,
>> I'm first interested in the average effect.
>>
>> Pearson correlations are derived from the association between heart rate
>> at submaximal intensity(ies) and criterion measure of aerobic fitness
>> (let's say max test). In theory, we expect an inverse relationship–lower
>> heart rate at submaximal intensity associated with higher result in max
>> test. Similarly to ICC, these effects are obtained from different
>> (submaximal) intensities, time-points across the season etc. I want to
>> extract the average effect, and then continue to moderator analyses.
>>
>> ## these are datasets examples
>>
>> es.id study.id  n  icc     yi     vi
>> 1     1         25 0.90 1.4722 0.0455
>> 2     1         25 0.84 1.2212 0.0455
>> 3     2         16 0.72 0.9076 0.0769
>> 4     2         16 0.85 1.2562 0.0769
>> 5     2         16 0.83 1.1881 0.0769
>> 6     3         38 0.92 1.5890 0.0286
>>
>> es.id study.id  n     r      yi           vi
>> 1           1       25 -0.61 -0.7089 0.0455
>> 2           1       25 -0.58 -0.6625 0.0455
>> 3           2       16 -0.58 -0.6625 0.0769
>> 4           2       16 -0.60 -0.6931 0.0769
>> 5           2       16 -0.56 -0.6328 0.0769
>> 6           3       38 -0.55 -0.6184 0.0286
>>
>> I tried to implement rcalc() with my dataset, but the thing is that I
>> have two variables that are constant (submax and submax tests for ICC and
>> submax and max tests for *r*) that were examined repeatedly within
>> samples, while the formula requires different paired variables
>> within sample/studies as the example here
>> https://wviechtb.github.io/metafor/reference/rcalc.html Is it right? I'm
>> quite lost here!
>>
>> (2) As you mentioned, I construct the correlation between raw mean
>> difference using impute_covariance_matrix(). However, as I understand, it
>> is more useful to use with standardised mean difference effects? The same
>> is true for raw mean difference?
>>
>> (3) Yes, my effect size index for SDs is the the log transformed SD (plus
>> bias correction for sample size) using the method proposed by Nakagawa
>> et.al 2015
>>
>> https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12309 ("SDLN"
>> in escal function) whereby the sampling variance is constructed solely
>> based on sample size 1/2(n-1). To construct the V-matrix I used Wolfgang
>> response in the mailing list regarding a similar question (see below)
>>  https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2019-July/001630.html
>> Is this the right (or a fair) way to approach this? Does the Delta method
>> work better here? If so, can you refer me to something that I can work with.
>>
>> ## this is a small example of my dataset
>>
>> es.id study.id  n   SD     yi     vi
>> 1         1         25 1.45 0.3924 0.0208
>> 2         1         25 1.41 0.3644 0.0208
>> 3         2         16 1.05 0.0821 0.0333
>> 4         2         16 1.20 0.2157 0.0333
>> 5         2         16 1.27 0.2724 0.0333
>> 6         3          38 1.24 0.2286 0.0135
>>
>> I'll be grateful to get any insight from you and others on the mailing
>> list..
>>
>> Thanks and Kind regards,
>>
>> Tzlil Shushan | Sport Scientist, Physical Preparation Coach
>>
>> BEd Physical Education and Exercise Science
>> MSc Exercise Science - High Performance Sports: Strength &
>> Conditioning, CSCS
>> PhD Candidate Human Performance Science & Sports Analytics
>>
>>
>>
>> ‫בתאריך יום ד׳, 20 בינו׳ 2021 ב-1:53 מאת ‪James Pustejovsky‬‏ <‪
>> jepusto using gmail.com‬‏>:‬
>>
>>> Hi Tzlil,
>>>
>>> Apologies for the long delay in responding to your query. You've raised
>>> some excellent questions about rather subtle issues. To your question (1),
>>> I would say that it is NOT compulsory to use a sampling variance-covariance
>>> matrix (the "V-matrix") for the effect sizes of each type. Omitting the
>>> V-matrix amounts to assuming a correlation of zero. If your goal is
>>> primarily to understand average effect sizes (of each type), then using
>>> robust standard errors/hypothesis tests/confidence intervals will work even
>>> if you have a mis-specified assumption about the correlation among effect
>>> sizes.
>>>
>>> That said, there are at least two potential benefits to using a V-matrix
>>> based on more plausible assumptions about the correlation between effect
>>> sizes. First, using a working model that is closer to the true dependence
>>> structure will yield a more precise estimate of the average effect. If
>>> you're just estimating an average effect of each type, the gain in
>>> precision will probably be pretty small. If you're estimating a
>>> meta-regression with a more complex set of predictors, the gains can be
>>> more substantial.
>>>
>>> The second potential benefit is that using a plausible V-matrix will
>>> give you better, more defensible estimates of the variance components
>>> (between-study variance and within-study variance). Whether based on REML
>>> or some other estimation method, the variance component estimates are NOT
>>> robust to mistaken assumptions about the sampling correlation structure.
>>> They'll be biased unless you have the sampling correlation structure
>>> approximately correct. So to the extent that understanding heterogeneity is
>>> important, I think it's worth working on building in a V-matrix.
>>>
>>> To your question (2), I like the approach you've outlined, where you use
>>> different V-matrices for each of the effect indices you're looking at. I
>>> think ideally, you would start by making a single assumption about the
>>> degree of correlation between the *outcomes*, and then using that to
>>> derive the appropriate degree of correlation between each of the indices:
>>> * For raw mean differences, the correlation between outcomes will
>>> translate directly into the correlation between mean differences.
>>> * For SDs, I'm not sure exactly what your ES index is. Is it the log
>>> ratio of SDs? How did you arrive at the formula for the correlation between
>>> effect sizes? I don't know of a source for this, but it could be derived
>>> via the delta method.
>>> * For ICCs and pearson correlations, using Wolfgang's function would be
>>> the way to go. Perhaps if you can provide a small example of your data and
>>> syntax that you've attempted, folks on the list can provide guidance about
>>> applying the function.
>>>
>>> Kind Regards,
>>> James
>>>
>>> On Thu, Jan 7, 2021 at 5:16 PM Tzlil Shushan <tzlil21092 using gmail.com>
>>> wrote:
>>>
>>>> Dear Wolfgang and James,
>>>>
>>>> Apologise for the long assay in advance..
>>>>
>>>> In my meta-analysis I obtained different effect sizes coming
>>>> from test-retest and correlational designs. Accordingly, I performed 4
>>>> different meta-analyses for each effect size:
>>>>
>>>> Raw mean difference of test-retest
>>>> Standard deviation (using Nakagawa et al. 2015 approach) of test-retest
>>>> Intraclass correlation (transformed to z fisher values) of test-retest
>>>> Pearson correlation coefficient (transformed to z fisher
>>>> values) derived from the same test against criterion measure.
>>>>
>>>> Because many studies meeting inclusion criteria provided more than one
>>>> effect size through various ways of repeated measures (for example,
>>>> multiple intensities of the test, repeated measures across the year), which
>>>> all based on a common sample of participants, I treated each unique sample
>>>> as an independent study (NOTE: this approach serves our purposes on the
>>>> best way and adding further level will results in low number of
>>>> clusters–which I don't want, given the use of RVE).
>>>>
>>>> Thanks to the great discussions in this group, we've done the following:
>>>> (1) used rma.mv() to estimate the overall average estimate and the
>>>> variance in hierarchical working model. The same for meta-regressions we
>>>> performed.
>>>>
>>>> (2) Compute robust variance estimates with robust() and coef_test()
>>>> functions, clustering at the level of studies (the same is true for both
>>>> overall models and meta-regression).
>>>>
>>>> However, after reading some threads in the groups in the last weeks
>>>>
>>>> https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2021-January/002565.html
>>>>
>>>> https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2018-February/000647.html
>>>> and more...I think that one step further is to
>>>> provide variance-covariance matrices for each meta-analysis before step 1
>>>> and 2 noted above.
>>>>
>>>> In this regard I have some other questions:
>>>>
>>>> (1) Is it compulsory to create (an estimate) variance-covariance given
>>>> the structure of my dataset?
>>>>
>>>> (2) IF YES, I'm not sure if I can use the same covariance formulas for
>>>> all effect sizes. For example, impute_covariance_matrix() from clubSandwich
>>>> can work fine with all effect sizes (mean diff, SD, icc etc.)? or should I
>>>> estimate the covariance-matrix with a unique function for each effect size?
>>>>
>>>> Based on reading and suggestions:
>>>> • I used impute_covariance_matrix() for mean difference.
>>>>
>>>> • For standard deviation I constructed the formula below:
>>>> calc.v <- function(x) {
>>>> v <- matrix(r^2/(2*x$ni[1]-1), nrow=nrow(x), ncol=nrow(x))
>>>> diag(v) <- x$vi
>>>> v
>>>> }
>>>> V <- bldiag(lapply(split(dat, dat$study), calc.v))
>>>> http://www.metafor-project.org/doku.php/analyses:gleser2009
>>>>
>>>> • for icc and pearson correlation I've looked at this
>>>> https://wviechtb.github.io/metafor/reference/rcalc.html but I couldn't
>>>> create something which is appropriate to my dataset (I don't really know
>>>> how to specify var1 and var2).
>>>>
>>>> With this regard, I created a sensitivity analysis (with 0.3, 0.5, 0.7
>>>> and 0.9) which revealed similar overall estimates (also similar to the
>>>> working models without covariance-matrix), albeit, changed a bit the
>>>> magnitude of sigma2.1 and sigma2.2
>>>>
>>>> I'll be thankful to get any thoughts..
>>>>
>>>> Kind regards and thanks in advance!
>>>>
>>>> Tzlil
>>>>
>>>> Tzlil Shushan | Sport Scientist, Physical Preparation Coach
>>>>
>>>> BEd Physical Education and Exercise Science
>>>> MSc Exercise Science - High Performance Sports: Strength &
>>>> Conditioning, CSCS
>>>> PhD Candidate Human Performance Science & Sports Analytics
>>>>
>>>>

	[[alternative HTML version deleted]]