[R-meta] Meta-analysis approach for physical qualities benchmarks

Fri Jul 5 00:01:23 CEST 2024

Hi Tzlil,

Some comments inline below.

James

On Wed, Jul 3, 2024 at 12:14 AM Tzlil Shushan <tzlil21092 using gmail.com> wrote:

> From reviewing the papers you referred me to for further reading and
> several examples from the metafor mvmeta packages using bivariate
> approaches, it appears that these approaches are often derived from similar
> effect sizes (e.g., log odds ratios nested within intervention groups,
> intervention and control). Considering that my dataset includes means and
> SDs, which are two distinct effect sizes, would it be possible to
> meta-analyse separately within a single multivariate multilevel model?
>

Yes. It's true that multivariate models are often used for effects in the
same metric, but there is nothing about the model that prevents it from
being used with conceptually distinct outcomes (so long as those distinct
outcomes are correlated).

> Indeed, I've been trying a few options (might be silly ones), such as
> merging yi and vi from data_means and data_sd and adding measure (i.e.,
> means or sd) as a moderator in the model.
>

Those are good ideas, for sure.

> However, this approach does not seem appropriate when considering the
> variance components at different levels (i.e., sigma).
>

You can specify multivariate models where each dimension has a different
variance component. (i.e., the variance of the sample means is different
than the variance of the sample variances).

> I also could not specify the covariance matrix of the two datasets because
> the two have different sampling variance.
>

For a normally distributed variable, the sample mean is independent of the
sample variance. (For other distributions, this isn't exactly true but it
seems like a reasonable starting point.) Thus, if you stack the M effect
sizes and the lnSD effect sizes on top of each other, you can treat them as
independent.

> Considering that the meta-analysis of SDs is conducted directly on the
> sampling variance of the means, might this justify the use of separate
> models?
>

I see what you mean. Your point highlights a limitation of what I'm
suggesting, but I don't think that this justifies the use of separate
models. There are probably more principled ways of approaching this problem
(using mixed effect location scale models or what are called "generalized
additive models for location scale and shape" or "bayesian additive models
for location scale and shape"), but I would not recommend that route
without more hands on input from a statistician who knows such models.

> Interestingly, I have seen a few studies analysing means and SDs
> separately. They discussed the pooled SD as between-individual SD, but none
> of them used the two to build benchmarks (for example, z-scores).
>

Whether what I'm suggesting will make any difference at all (versus just
doing separate models) does depend on whether there's an association
between the M and SD parameters. Before going any further, it might be
useful to make some scatterplots of your effect size data to see if a
correlation is apparent.

As I mentioned in my previous response, doing separate models is also still
informative and the results from those models might suggest whether going
further with the multivariate model is really needed. For example, if the
lnSDs are actually quite consistent across samples, with little
heterogeneity above and beyond sampling error, then there's probably little
point in doing a multivariate model, because there'd be no variation in one
of the dimensions.

> To provide further explanation of how my dataset looks, below is an
> example with the first 5 studies of one of the physical performance tests
> (20m sprint time). I did not include other columns which will be used as
> moderators in the following models.
>
> structure(list(Study.id = c("#4587", "#4587", "#11750", "#5320",
> "#5320", "#5320", "#5320", "#10188", "#10188", "#10188", "#10188",
> "#10188", "#10188", "#13817"), Group.id = c(2, 2, 3, 4, 5, 6,
> 7, 7, 8, 9, 10, 11, 12, 18), es.id = 1:14, n = c(16, 16, 23,
> 11, 11, 9, 6, 11, 13, 15, 10, 14, 12, 18), final.mean = c(3.39,
> 3.36, 3.52, 3.2, 3.3, 3.15, 3.41, 3.75, 3.68, 3.69, 3.71, 3.68,
> 3.64, 3.57), final.sd = c(0.21, 0.2, 0.18, 0.12, 0.16, 0.09,
> 0.17, 0.09, 0.1, 0.08, 0.06, 0.08, 0.1, 0.17), yi_mean = c(3.39,
> 3.36, 3.52, 3.2, 3.3, 3.15, 3.41, 3.75, 3.68, 3.69, 3.71, 3.68,
> 3.64, 3.57), vi_mean = c(0.003, 0.003, 0.001, 0.001, 0.002, 0.001,
> 0.005, 0.001, 0.001, 0, 0, 0, 0.001, 0.002), yi_sd = c(-1.527,
> -1.576, -1.692, -2.07, -1.783, -2.345, -1.672, -2.358, -2.261,
> -2.49, -2.758, -2.487, -2.257, -1.743), vi_sd = c(0.033, 0.033,
> 0.023, 0.05, 0.05, 0.062, 0.1, 0.05, 0.042, 0.036, 0.056, 0.038,
> 0.045, 0.029)), digits = c(est = 4, se = 4, test = 4, pval = 4,
> ci = 4, var = 4, sevar = 4, fit = 4, het = 4), row.names = c(NA,
> 14L), class = c("escalc", "data.frame"))
>
> Thanks for looking at the entire code and noticing that I've used the
> study level for computing the covariance matrix. While my dataset includes
> adult female soccer players only, many studies provided data for subgroups.
> I had no idea I could use groups for creating the covariance matrix and
> computing robust standard errors using study clusters.
>
> Would be great to hear back from you with some further thoughts.
>
> Best regards,
>
> Tzlil Shushan | Sport Scientist, Physical Preparation Coach
>
> BEd Physical Education and Exercise Science
> MSc Exercise Science - High Performance Sports: Strength &
> Conditioning, CSCS
> PhD Human Performance Science & Sports Analytics
>
>
>
> ‫בתאריך יום ד׳, 3 ביולי 2024 ב-4:00 מאת ‪James Pustejovsky‬‏ <‪
> jepusto using gmail.com‬‏>:‬
>
>> Hi Tzlil,
>>
>> From my perspective, your approach seems reasonable as a starting point
>> for characterizing the distribution of each of these quantities, but I
>> would be cautious about trying to create benchmarks based on the results of
>> two separate models. It seems like the benchmarks would be a non-linear
>> function of both the Ms and the SDs. Evaluating a non-linear function at
>> average values of the inputs does not produce the same result as evaluating
>> the average of a non-linear function of individual inputs, and it can be
>> poor even as an approximation. I would think that it would be preferable to
>> work towards a joint model for the Ms and SDs---treating them as two
>> dimensions of a bivariate effect size measure. I think this would be
>> feasible using multivariate meta-analysis models, for which the metafor
>> documentation provides extensive documentation. See also Gasparrini and
>> Armstrong (2011; https://doi.org/10.1002/sim.4226) and Sera et al.
>> (2019; https://doi.org/10.1002/sim.8362).
>>
>> A further reason to consider a joint (multivariate) model is that for
>> many distributions other than the Gaussian, mean parameters and variance
>> parameters tend to be related. For instance, count data distributions
>> typically have variances that grow larger as the mean grows larger. If the
>> physical quantities that you are modeling follow such distributions, then
>> capturing the interrelationship between the M and SD could be important
>> both for purposes of obtaining precise summary estimates and for the
>> interpretation of the results.
>>
>> One other small note about your code: for purposes of creating a sampling
>> variance covariance matrix, it makes sense to impute covariances between
>> effect size estimates that are based on the same  sample (or at least
>> partially overlapping samples). I see from your rma.mv code that you
>> have random effects for effect sizes nested in groups nested in studies. If
>> the groups within a study are independent (e.g., separate samples of male
>> and female athletes), then the effect sizes from different groups should
>> probably be treated as independent. In this case, your call to
>> impute_covariance_matrix() should cluster by Group.id instead of by
>> Study.id. But for purposes of computing robust standard errors, you would
>> still use cluster = Study.id.
>>
>> James
>>
>> On Sun, Jun 30, 2024 at 7:31 PM Tzlil Shushan via R-sig-meta-analysis <
>> r-sig-meta-analysis using r-project.org> wrote:
>>
>>> Dear Wolfgang and R-sig-meta-analysis community,
>>>
>>> I would like to see if I can pick your thoughts about an approach I am
>>> using in my current meta-analysis research.
>>>
>>> We are conducting a meta-analysis on a range of physical qualities. The
>>> primary objective of these meta-analyses is to create benchmarks for
>>> previous and future observations.
>>>
>>> For example, one of the physical qualities includes sprint times from
>>> discrete distances (5m to 40m). We have gathered descriptive data (means
>>> and standard deviations) from approximately 250 studies.
>>>
>>> We aim to provide practitioners in the field with tools to compare the
>>> results of their athletes to this benchmarking meta-analysis. Therefore,
>>> we
>>> want to include commonly used tools in our field, such as z-scores and
>>> percentiles, to facilitate these comparisons, alongside measures of
>>> uncertainty using CIs and PIs.
>>>
>>> Given that these approaches require the sample/population standard
>>> deviations, I have conducted separate multilevel mixed-effects
>>> meta-analyses for means and standard deviations.
>>>
>>> Below is an example of the approach I am considering:
>>>
>>> ############
>>> Meta-analysis of means:
>>>
>>> data_means <- escalc(measure = "MN",
>>>                mi = Final.Outcome,
>>>                sdi = Final.SD,
>>>                ni = Sample.Size,
>>>                data = data)
>>>
>>> V <- impute_covariance_matrix(vi = data_means$vi,
>>>
>>>                               cluster = data_means$Study.id,
>>>
>>>                               r = .7,
>>>
>>>                               smooth_vi = T)
>>>
>>>
>>> rma_means_model <- rma_means_model <- rma.mv(yi,
>>>
>>>                     V_means,
>>>                     random = list(~ 1 | Study.id/Group.id/ES.id),
>>>                     digits = 2,
>>>                     data = data_means,
>>>                     method = "REML",
>>>                     test = "t",
>>>                     control=list(optimizer="optim",
>>> optmethod="Nelder-Mead"))
>>>
>>> robust_means_model <- robust.rma.mv(rma_means_model,
>>>                               cluster = data_means$Study.id
>>>                               adjust = T,
>>>                               clubSandwich = T)
>>>
>>>
>>> est_robust_means_model <- predict.rma(robust_means_model, digits = 2,
>>> level
>>> = .9)
>>>
>>>
>>> ############
>>> Meta-analysis of SDs:
>>>
>>> data_sd <- escalc(measure = "SDLN",
>>>                sdi = Final.SD,
>>>                ni = Sample.Size,
>>>                data = data)
>>>
>>> V <- impute_covariance_matrix(vi = data_sd$vi,
>>>
>>>                               cluster = data_sd$Study.id,
>>>
>>>                               r = .7,
>>>
>>>                               smooth_vi = T)
>>>
>>>
>>> rma_sd_model <- rma.mv(yi,
>>>                     V_sd,
>>>                     random = list(~ 1 | Study.id./Group.id/ES.id),
>>>                     digits = 2,
>>>                     data = data_sd,
>>>                     method = "REML",
>>>                     test = "t",
>>>                     control=list(optimizer="optim",
>>> optmethod="Nelder-Mead"))
>>>
>>> robust_sd_model <- robust.rma.mv(rma_sd_model,
>>>                               cluster = data_sd$Study.id,
>>>                               adjust = T,
>>>                               clubSandwich = T)
>>>
>>>
>>> est_robust_sd_model <- predict.rma(robust_sd_model, digits = 2, transf =
>>> transf.exp.int, level = .9)
>>>
>>> I would greatly appreciate your thoughts/feedback on whether this
>>> approach
>>> is statistically sound. Specifically, is it appropriate to conduct
>>> separate
>>> meta-analyses for means and SDs and then use the pooled estimates for
>>> creating benchmarks? Are there any potential pitfalls or alternative
>>> methods you would recommend?
>>>
>>> Tzlil Shushan | Sport Scientist, Physical Preparation Coach
>>>
>>> BEd Physical Education and Exercise Science
>>> MSc Exercise Science - High Performance Sports: Strength &
>>> Conditioning, CSCS
>>> PhD Human Performance Science & Sports Analytics
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org
>>> To manage your subscription to this mailing list, go to:
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>
>>

	[[alternative HTML version deleted]]