[R-meta] Best choice of effect size

Luke Martinez m@rt|nez|ukerm @end|ng |rom gm@||@com
Mon Oct 4 02:18:13 CEST 2021


Sure, my understanding is that if the relationship between Means and
SDs equally affect M_c and M_t, no major issues arise. But if the two
groups are differentially affected by that relationship, then that can
bias the SMD up or down, no?

On Sun, Oct 3, 2021 at 7:09 PM James Pustejovsky <jepusto using gmail.com> wrote:
>
> Hi Luke,
> Responses inline below.
> James
>
> On Sun, Oct 3, 2021 at 3:16 PM Luke Martinez <martinezlukerm using gmail.com> wrote:
>>
>> Dear James,
>>
>> Thank you for the thorough and thought-provoking response. Here are my
>> two takeaways:
>>
>>  1- Your insightful advice seems to be a general criticism of SMDs in
>> general due to the use of some form of SD in the denominator and not
>> just when dealing with my situation (i.e., studies reporting M, and SD
>> of proportions and/or counts), right?
>>
> I would not go quite that far. The concerns I raised with the SMD are more salient when dealing with outcomes that are proportions or counts.
>
>>
>> 2- When using SMDs, one has to keep an open eye regarding reliability
>> estimates, and factors affecting them (e.g., time provided for the
>> test) in the studies and possibly control for them in the analysis,
>> right?
>>
> Yes. Although, I would add that using an effect measure that is invariant (or at least relatively robust) to such factors is preferable to trying to account for the factors using meta-regression.
>
>>
>> I also wanted to clarify two things:
>>
>> First, by log-transformed response ratio, you mean "ROM" or "ROMC" as
>> represented in metafor::escalc?
>>
> Yes.
>
>> Second, by reference group, you simply mean the mean for each
>> treatment group as denoted by M_t in (M_t - M_c / Pooled_SD)?
>>
> I had in mind the control groups (M_c), although my comment would apply equally to the treatment groups.
>
>>
>> Respectfully,
>> Luke
>>
>> On Sun, Oct 3, 2021 at 11:31 AM James Pustejovsky <jepusto using gmail.com> wrote:
>> >
>> > Hi Luke,
>> >
>> > Based on your responses, I think the response ratio could be an
>> > appropriate effect measure and further that there could be drawbacks
>> > to using the standardized mean difference. Let me note potential
>> > drawbacks first.
>> >
>> > * Variation in the number of possible errors (and perhaps also in the
>> > length of the time provided for the test?) suggests that the measures
>> > from different studies may have varying degrees of reliability.
>> > Varying reliability introduces heterogeneity in the SMD (because the
>> > denominator is inflated or shrunk by the degree of reliability).
>> >
>> > * A relationship between the M and SD of the proportions for a given
>> > group suggests that the distribution of the individual-level outcomes
>> > might also exhibit mean-variance relationships. (I say "suggests"
>> > rather than implies because there's an ecological inference here,
>> > i.e., assuming something about individual-level variation on the basis
>> > of group-level variation). If this supposition is reasonable, then
>> > that introduces a further potential source of heterogeneity in the
>> > SMDs (study-to-study variation in the M for the reference group
>> > influences the SD of the reference group, thereby inflating or
>> > shrinking the SMDs).
>> >
>> > The response ratio does not have these same concerns because it is a
>> > function of the group means alone. (The standard error of the response
>> > ratio involves the SD of each group, but the effect size metric itself
>> > does not.) Further, you noted that the group means are not too near
>> > the extremes of the scale, so the (log-transformed) response ratio
>> > should be reasonably "well-behaved" in terms of its sampling
>> > distribution.
>> >
>> > In light of the above, here's how I might proceed if I were conducting
>> > this analysis:
>> > 1. Calculate *both* SMDs and log-transformed response ratios for the
>> > full set of studies.
>> > 2. Examine the distribution of effect size estimates for each metric
>> > (using histograms or funnel plots). If one of the distributions is
>> > skewed or has extreme outliers, take that as an indication that the
>> > metric might not be appropriate.
>> > 3. Fit meta-analytic models to summarize the distribution of effect
>> > sizes in each metric, using a model that appropriately describes the
>> > dependence structure of the estimates. Calculate I-squared statistics,
>> > give preference to the metric with lower I-squared.
>> > 4. If (2) and (3) don't lead to a clearly preferable metric, then
>> > choose between SMD and RR based on whichever will make the synthesis
>> > results easier to explain to people.
>> > 5. (Optional/extra credit) Whichever metric you choose, repeat your
>> > main analyses using the other metric and stuff all those results in
>> > supplementary materials, to satisfy any inveterate statistical
>> > curmudgeons who might review/read your synthesis.
>> >
>> > James
>> >
>> >
>> > > On Oct 1, 2021, at 12:39 AM, Luke Martinez <martinezlukerm using gmail.com> wrote:
>> > >
>> > > Dear James,
>> >
>> > >
>> > > Thank you for the insightful comments. Here are my answers inline:
>> > >
>> > >>> 1- Is the total number possible, the same for the groups being compared within a given study?
>> > >
>> > > Not necessarily.
>> > >
>> > >>> 2- Did some studies use passages with many possible errors to be corrected while other studies used passages with just a few errors?
>> > >
>> > > Yes, that's correct. Passage characteristics are fully coded for as
>> > > potential moderators.
>> > >
>> > >>> 3- Did the difficulty of the passages differ from study to study?
>> > >
>> > > Yes, that's correct. Studies with more advanced students used more
>> > > difficult passages.
>> > >
>> > >>> 4- Were there very low or very high mean proportions in any studies?
>> > >
>> > > No, means were never so close to 0 or 1.
>> > >
>> > >>> 5- Does there seem to be a relationship between the means and the variances of the proportions of a given group?
>> > >
>> > > Assuming you mean the following, yes:
>> > >
>> > > group1_M_prop = c(.39, .18, .13)
>> > > group1_SD_prop = c(.25, .16, .13)
>> > >
>> > > plot(group1_M_prop, group1_SD_prop^2)
>> > >
>> > > Thanks,
>> > > Luke
>> > >
>> > >> On Thu, Sep 30, 2021 at 10:17 PM James Pustejovsky <jepusto using gmail.com> wrote:
>> > >>
>> > >> Hi Luke,
>> > >>
>> > >> To add to Wolfgang's comments, I would suggest that you could also consider other effect measures besides the SMD. For example, the response ratio is also a scale-free metric that could work with the proportion outcomes that you've described, and would also be appropriate for raw frequency counts as long as the total number possible is the same for the groups being compared within a given study.
>> > >>
>> > >> Whether the response ratio would be more appropriate than the SMD is hard to gauge. One would need to know more about how the proportions were assessed and how the assessment procedures varied from study to study. For instance, did some studies use passages with many possible errors to be corrected while other studies used passages with just a few errors? Did the difficulty of the passages differ from study to study? Were there very low or very high mean proportions in any studies? Does there seem to be a relationship between the means and the variances of the proportions of a given group?
>> > >>
>> > >> James
>> > >>
>> > >>> On Thu, Sep 30, 2021 at 2:22 AM Luke Martinez <martinezlukerm using gmail.com> wrote:
>> > >>>
>> > >>> Dear Wolfgang,
>> > >>>
>> > >>> Thank you so much for your response and also the references.
>> > >>>
>> > >>> I will compute an SMD from the means and sds of all types of proportions
>> > >>> and the raw counts reported in the papers.
>> > >>>
>> > >>> Instead of a moderator, I thought I add a random effect for the variation
>> > >>> in these types of proportions and raw counts, which will be crossed with
>> > >>> studies (I think), because true effects can be correlated (?) due to
>> > >>> sharing a study as well as sharing one of these types of proportions or raw
>> > >>> counts, right?
>> > >>>
>> > >>> proportion_type1 = # of corrected items / all items needing correction
>> > >>>
>> > >>> proportion_type2 = # of corrected items / (all items needing
>> > >>> correction + all wrongly corrected items)
>> > >>>
>> > >>> raw_counts = # of corrected items
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Thu, Sep 30, 2021, 1:33 AM Viechtbauer, Wolfgang (SP) <
>> > >>> wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>> > >>>
>> > >>>> Hi Luke,
>> > >>>>
>> > >>>> Yes, treating the mean proportions as means is ok -- after all, they are
>> > >>>> means. As long as n is not too small (and the true mean proportion not too
>> > >>>> close to 0 or 1), then the CLT will also ensure that the sampling
>> > >>>> distribution of a mean proportion is approximately normal.
>> > >>>>
>> > >>>> We have analayzed such mean proportions in these articles:
>> > >>>>
>> > >>>> McCurdy, M. P., Viechtbauer, W., Sklenar, A. M., Frankenstein, A. N., &
>> > >>>> Leshikar, E. D. (2020). Theories of the generation effect and the impact of
>> > >>>> generation constraint: A meta-analytic review. Psychonomic Bulletin &
>> > >>>> Review, 27(6), 1139-1165. https://doi.org/10.3758/s13423-020-01762-3
>> > >>>>
>> > >>>> Vachon, H., Viechtbauer, W., Rintala, A., & Myin-Germeys, I. (2019).
>> > >>>> Compliance and retention with the experience sampling method over the
>> > >>>> continuum of severe mental disorders: Meta-analysis and recommendations.
>> > >>>> Journal of Medical Internet Research, 21(12), e14475.
>> > >>>> https://doi.org/10.2196/14475
>> > >>>>
>> > >>>> In these articles, we did not compute standardized mean differences based
>> > >>>> on the mean proportions, but one could do so.
>> > >>>>
>> > >>>> For the data below:
>> > >>>>
>> > >>>> escalc(measure="SMD", m1i=0.45, m2i=0.17, sd1i=0.17, sd2i=0.11, n1i=20,
>> > >>>> n2i=19)
>> > >>>>
>> > >>>> If I understand you correctly, the second type are means of counts (i.e.,
>> > >>>> there is a count for each subject and for example 4.5 is the mean of those
>> > >>>> counts). Again, while an individual count might have other distributional
>> > >>>> properties (e.g., Poisson or negative binomial), once you take the mean,
>> > >>>> it's a mean and the CLT 'kicks in'. So I would again say: yes, you can
>> > >>>> treat these as 'regular' means and compute SMDs based on them.
>> > >>>>
>> > >>>> For the data below:
>> > >>>>
>> > >>>> escalc(measure="SMD", m1i=4.5, m2i=4.7, sd1i=1.12, sd2i=1.59, n1i=17,
>> > >>>> n2i=18)
>> > >>>>
>> > >>>> I might be inclined to code a moderator that distinguishes these different
>> > >>>> types, to see if there is some systematic difference between them.
>> > >>>>
>> > >>>> Best,
>> > >>>> Wolfgang
>> > >>>>
>> > >>>>> -----Original Message-----
>> > >>>>> From: Luke Martinez [mailto:martinezlukerm using gmail.com]
>> > >>>>> Sent: Thursday, 30 September, 2021 0:32
>> > >>>>> To: R meta
>> > >>>>> Cc: Viechtbauer, Wolfgang (SP); James Pustejovsky
>> > >>>>> Subject: Re: Best choice of effect size
>> > >>>>>
>> > >>>>> Dear All,
>> > >>>>>
>> > >>>>> To further clarify, the proportion types (my previous email) are used
>> > >>>>> to score each study participant's performance on the text. Then, each
>> > >>>>> study reports the "mean" and "sd" of a proportion type for control and
>> > >>>>> experimental groups (to then compare them with t-tests and ANOVAs).
>> > >>>>>
>> > >>>>> For example, a study using proportion_type1 (see my previous email)
>> > >>>>> can provide the following for effect size calculation:
>> > >>>>>
>> > >>>>>              Mean    SD     n
>> > >>>>> group1   0.45      0.17  20
>> > >>>>> group2   0.17      0.11  19
>> > >>>>>
>> > >>>>> The same is true for studies that use raw frequencies to score each
>> > >>>>> study participant's performance on the text. In such studies, often,
>> > >>>>> "mean" and "sd" of the  # of corrected items (numerator of the
>> > >>>>> proportions in my previous email) for control and experimental groups
>> > >>>>> (to then compare them with t-tests and ANOVAs).
>> > >>>>>
>> > >>>>> For example, a study using (raw) # of corrected items can provide the
>> > >>>>> following for effect size calculation:
>> > >>>>>
>> > >>>>>              Mean    SD   n
>> > >>>>> group1   4.5      1.12  17
>> > >>>>> group2   4.7      1.59  18
>> > >>>>>
>> > >>>>> My question is that can I calculate SMD across all such studies given
>> > >>>>> their intent is to measure the same thing?
>> > >>>>>
>> > >>>>> Thank you,
>> > >>>>> Luke
>> > >>>>>
>> > >>>>> On Wed, Sep 29, 2021 at 12:12 PM Luke Martinez <martinezlukerm using gmail.com>
>> > >>>> wrote:
>> > >>>>>>
>> > >>>>>> Dear All,
>> > >>>>>>
>> > >>>>>> I'm doing a meta-analysis where the papers report only "mean" and "sd"
>> > >>>>>> of some form of proportion and/or "mean" and "sd" of corresponding raw
>> > >>>>>> frequencies. (For context, the papers ask students to read, find, and
>> > >>>>>> correct the wrong words in a text.)
>> > >>>>>>
>> > >>>>>> By some form of proportion, I mean, some papers report actual
>> > >>>> proportions:
>> > >>>>>>
>> > >>>>>> proportion_type1 = # of corrected items / all items needing correction
>> > >>>>>>
>> > >>>>>> Some paper report a modified version of proportions:
>> > >>>>>>
>> > >>>>>> proportion_type2 = # of corrected items / (all items needing
>> > >>>>>> correction + all wrongly corrected items)
>> > >>>>>>
>> > >>>>>> There are other versions of proportions and corresponding raw
>> > >>>>>> frequencies as well. But my question is given that all these studies
>> > >>>>>> only report "mean" and "sd", can I simply use a SMD effect size?
>> > >>>>>>
>> > >>>>>> Many thanks,
>> > >>>>>> Luke
>> > >>>>
>> > >>>
>> > >>>        [[alternative HTML version deleted]]
>> > >>>
>> > >>> _______________________________________________
>> > >>> R-sig-meta-analysis mailing list
>> > >>> R-sig-meta-analysis using r-project.org
>> > >>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis



More information about the R-sig-meta-analysis mailing list