[R-meta] Best choice of effect size

Sun Oct 3 18:31:16 CEST 2021

Hi Luke,

Based on your responses, I think the response ratio could be an
appropriate effect measure and further that there could be drawbacks
to using the standardized mean difference. Let me note potential
drawbacks first.

* Variation in the number of possible errors (and perhaps also in the
length of the time provided for the test?) suggests that the measures
from different studies may have varying degrees of reliability.
Varying reliability introduces heterogeneity in the SMD (because the
denominator is inflated or shrunk by the degree of reliability).

* A relationship between the M and SD of the proportions for a given
group suggests that the distribution of the individual-level outcomes
might also exhibit mean-variance relationships. (I say "suggests"
rather than implies because there's an ecological inference here,
i.e., assuming something about individual-level variation on the basis
of group-level variation). If this supposition is reasonable, then
that introduces a further potential source of heterogeneity in the
SMDs (study-to-study variation in the M for the reference group
influences the SD of the reference group, thereby inflating or
shrinking the SMDs).

The response ratio does not have these same concerns because it is a
function of the group means alone. (The standard error of the response
ratio involves the SD of each group, but the effect size metric itself
does not.) Further, you noted that the group means are not too near
the extremes of the scale, so the (log-transformed) response ratio
should be reasonably "well-behaved" in terms of its sampling
distribution.

In light of the above, here's how I might proceed if I were conducting
this analysis:
1. Calculate *both* SMDs and log-transformed response ratios for the
full set of studies.
2. Examine the distribution of effect size estimates for each metric
(using histograms or funnel plots). If one of the distributions is
skewed or has extreme outliers, take that as an indication that the
metric might not be appropriate.
3. Fit meta-analytic models to summarize the distribution of effect
sizes in each metric, using a model that appropriately describes the
dependence structure of the estimates. Calculate I-squared statistics,
give preference to the metric with lower I-squared.
4. If (2) and (3) don't lead to a clearly preferable metric, then
choose between SMD and RR based on whichever will make the synthesis
results easier to explain to people.
5. (Optional/extra credit) Whichever metric you choose, repeat your
main analyses using the other metric and stuff all those results in
supplementary materials, to satisfy any inveterate statistical
curmudgeons who might review/read your synthesis.

James

> On Oct 1, 2021, at 12:39 AM, Luke Martinez <martinezlukerm using gmail.com> wrote:
>
> Dear James,

>
> Thank you for the insightful comments. Here are my answers inline:
>
>>> 1- Is the total number possible, the same for the groups being compared within a given study?
>
> Not necessarily.
>
>>> 2- Did some studies use passages with many possible errors to be corrected while other studies used passages with just a few errors?
>
> Yes, that's correct. Passage characteristics are fully coded for as
> potential moderators.
>
>>> 3- Did the difficulty of the passages differ from study to study?
>
> Yes, that's correct. Studies with more advanced students used more
> difficult passages.
>
>>> 4- Were there very low or very high mean proportions in any studies?
>
> No, means were never so close to 0 or 1.
>
>>> 5- Does there seem to be a relationship between the means and the variances of the proportions of a given group?
>
> Assuming you mean the following, yes:
>
> group1_M_prop = c(.39, .18, .13)
> group1_SD_prop = c(.25, .16, .13)
>
> plot(group1_M_prop, group1_SD_prop^2)
>
> Thanks,
> Luke
>
>> On Thu, Sep 30, 2021 at 10:17 PM James Pustejovsky <jepusto using gmail.com> wrote:
>>
>> Hi Luke,
>>
>> To add to Wolfgang's comments, I would suggest that you could also consider other effect measures besides the SMD. For example, the response ratio is also a scale-free metric that could work with the proportion outcomes that you've described, and would also be appropriate for raw frequency counts as long as the total number possible is the same for the groups being compared within a given study.
>>
>> Whether the response ratio would be more appropriate than the SMD is hard to gauge. One would need to know more about how the proportions were assessed and how the assessment procedures varied from study to study. For instance, did some studies use passages with many possible errors to be corrected while other studies used passages with just a few errors? Did the difficulty of the passages differ from study to study? Were there very low or very high mean proportions in any studies? Does there seem to be a relationship between the means and the variances of the proportions of a given group?
>>
>> James
>>
>>> On Thu, Sep 30, 2021 at 2:22 AM Luke Martinez <martinezlukerm using gmail.com> wrote:
>>>
>>> Dear Wolfgang,
>>>
>>> Thank you so much for your response and also the references.
>>>
>>> I will compute an SMD from the means and sds of all types of proportions
>>> and the raw counts reported in the papers.
>>>
>>> Instead of a moderator, I thought I add a random effect for the variation
>>> in these types of proportions and raw counts, which will be crossed with
>>> studies (I think), because true effects can be correlated (?) due to
>>> sharing a study as well as sharing one of these types of proportions or raw
>>> counts, right?
>>>
>>> proportion_type1 = # of corrected items / all items needing correction
>>>
>>> proportion_type2 = # of corrected items / (all items needing
>>> correction + all wrongly corrected items)
>>>
>>> raw_counts = # of corrected items
>>>
>>>
>>>
>>> On Thu, Sep 30, 2021, 1:33 AM Viechtbauer, Wolfgang (SP) <
>>> wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>>>
>>>> Hi Luke,
>>>>
>>>> Yes, treating the mean proportions as means is ok -- after all, they are
>>>> means. As long as n is not too small (and the true mean proportion not too
>>>> close to 0 or 1), then the CLT will also ensure that the sampling
>>>> distribution of a mean proportion is approximately normal.
>>>>
>>>> We have analayzed such mean proportions in these articles:
>>>>
>>>> McCurdy, M. P., Viechtbauer, W., Sklenar, A. M., Frankenstein, A. N., &
>>>> Leshikar, E. D. (2020). Theories of the generation effect and the impact of
>>>> generation constraint: A meta-analytic review. Psychonomic Bulletin &
>>>> Review, 27(6), 1139-1165. https://doi.org/10.3758/s13423-020-01762-3
>>>>
>>>> Vachon, H., Viechtbauer, W., Rintala, A., & Myin-Germeys, I. (2019).
>>>> Compliance and retention with the experience sampling method over the
>>>> continuum of severe mental disorders: Meta-analysis and recommendations.
>>>> Journal of Medical Internet Research, 21(12), e14475.
>>>> https://doi.org/10.2196/14475
>>>>
>>>> In these articles, we did not compute standardized mean differences based
>>>> on the mean proportions, but one could do so.
>>>>
>>>> For the data below:
>>>>
>>>> escalc(measure="SMD", m1i=0.45, m2i=0.17, sd1i=0.17, sd2i=0.11, n1i=20,
>>>> n2i=19)
>>>>
>>>> If I understand you correctly, the second type are means of counts (i.e.,
>>>> there is a count for each subject and for example 4.5 is the mean of those
>>>> counts). Again, while an individual count might have other distributional
>>>> properties (e.g., Poisson or negative binomial), once you take the mean,
>>>> it's a mean and the CLT 'kicks in'. So I would again say: yes, you can
>>>> treat these as 'regular' means and compute SMDs based on them.
>>>>
>>>> For the data below:
>>>>
>>>> escalc(measure="SMD", m1i=4.5, m2i=4.7, sd1i=1.12, sd2i=1.59, n1i=17,
>>>> n2i=18)
>>>>
>>>> I might be inclined to code a moderator that distinguishes these different
>>>> types, to see if there is some systematic difference between them.
>>>>
>>>> Best,
>>>> Wolfgang
>>>>
>>>>> -----Original Message-----
>>>>> From: Luke Martinez [mailto:martinezlukerm using gmail.com]
>>>>> Sent: Thursday, 30 September, 2021 0:32
>>>>> To: R meta
>>>>> Cc: Viechtbauer, Wolfgang (SP); James Pustejovsky
>>>>> Subject: Re: Best choice of effect size
>>>>>
>>>>> Dear All,
>>>>>
>>>>> To further clarify, the proportion types (my previous email) are used
>>>>> to score each study participant's performance on the text. Then, each
>>>>> study reports the "mean" and "sd" of a proportion type for control and
>>>>> experimental groups (to then compare them with t-tests and ANOVAs).
>>>>>
>>>>> For example, a study using proportion_type1 (see my previous email)
>>>>> can provide the following for effect size calculation:
>>>>>
>>>>>              Mean    SD     n
>>>>> group1   0.45      0.17  20
>>>>> group2   0.17      0.11  19
>>>>>
>>>>> The same is true for studies that use raw frequencies to score each
>>>>> study participant's performance on the text. In such studies, often,
>>>>> "mean" and "sd" of the  # of corrected items (numerator of the
>>>>> proportions in my previous email) for control and experimental groups
>>>>> (to then compare them with t-tests and ANOVAs).
>>>>>
>>>>> For example, a study using (raw) # of corrected items can provide the
>>>>> following for effect size calculation:
>>>>>
>>>>>              Mean    SD   n
>>>>> group1   4.5      1.12  17
>>>>> group2   4.7      1.59  18
>>>>>
>>>>> My question is that can I calculate SMD across all such studies given
>>>>> their intent is to measure the same thing?
>>>>>
>>>>> Thank you,
>>>>> Luke
>>>>>
>>>>> On Wed, Sep 29, 2021 at 12:12 PM Luke Martinez <martinezlukerm using gmail.com>
>>>> wrote:
>>>>>>
>>>>>> Dear All,
>>>>>>
>>>>>> I'm doing a meta-analysis where the papers report only "mean" and "sd"
>>>>>> of some form of proportion and/or "mean" and "sd" of corresponding raw
>>>>>> frequencies. (For context, the papers ask students to read, find, and
>>>>>> correct the wrong words in a text.)
>>>>>>
>>>>>> By some form of proportion, I mean, some papers report actual
>>>> proportions:
>>>>>>
>>>>>> proportion_type1 = # of corrected items / all items needing correction
>>>>>>
>>>>>> Some paper report a modified version of proportions:
>>>>>>
>>>>>> proportion_type2 = # of corrected items / (all items needing
>>>>>> correction + all wrongly corrected items)
>>>>>>
>>>>>> There are other versions of proportions and corresponding raw
>>>>>> frequencies as well. But my question is given that all these studies
>>>>>> only report "mean" and "sd", can I simply use a SMD effect size?
>>>>>>
>>>>>> Many thanks,
>>>>>> Luke
>>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-meta-analysis mailing list
>>> R-sig-meta-analysis using r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis