[R-meta] Best choice of effect size
James Pustejovsky
jepu@to @end|ng |rom gm@||@com
Mon Oct 4 02:09:45 CEST 2021
Hi Luke,
Responses inline below.
James
On Sun, Oct 3, 2021 at 3:16 PM Luke Martinez <martinezlukerm using gmail.com>
wrote:
> Dear James,
>
> Thank you for the thorough and thought-provoking response. Here are my
> two takeaways:
>
> 1- Your insightful advice seems to be a general criticism of SMDs in
> general due to the use of some form of SD in the denominator and not
> just when dealing with my situation (i.e., studies reporting M, and SD
> of proportions and/or counts), right?
>
> I would not go quite that far. The concerns I raised with the SMD are more
salient when dealing with outcomes that are proportions or counts.
> 2- When using SMDs, one has to keep an open eye regarding reliability
> estimates, and factors affecting them (e.g., time provided for the
> test) in the studies and possibly control for them in the analysis,
> right?
>
> Yes. Although, I would add that using an effect measure that is invariant
(or at least relatively robust) to such factors is preferable to trying to
account for the factors using meta-regression.
> I also wanted to clarify two things:
>
> First, by log-transformed response ratio, you mean "ROM" or "ROMC" as
> represented in metafor::escalc?
>
> Yes.
Second, by reference group, you simply mean the mean for each
> treatment group as denoted by M_t in (M_t - M_c / Pooled_SD)?
>
> I had in mind the control groups (M_c), although my comment would apply
equally to the treatment groups.
> Respectfully,
> Luke
>
> On Sun, Oct 3, 2021 at 11:31 AM James Pustejovsky <jepusto using gmail.com>
> wrote:
> >
> > Hi Luke,
> >
> > Based on your responses, I think the response ratio could be an
> > appropriate effect measure and further that there could be drawbacks
> > to using the standardized mean difference. Let me note potential
> > drawbacks first.
> >
> > * Variation in the number of possible errors (and perhaps also in the
> > length of the time provided for the test?) suggests that the measures
> > from different studies may have varying degrees of reliability.
> > Varying reliability introduces heterogeneity in the SMD (because the
> > denominator is inflated or shrunk by the degree of reliability).
> >
> > * A relationship between the M and SD of the proportions for a given
> > group suggests that the distribution of the individual-level outcomes
> > might also exhibit mean-variance relationships. (I say "suggests"
> > rather than implies because there's an ecological inference here,
> > i.e., assuming something about individual-level variation on the basis
> > of group-level variation). If this supposition is reasonable, then
> > that introduces a further potential source of heterogeneity in the
> > SMDs (study-to-study variation in the M for the reference group
> > influences the SD of the reference group, thereby inflating or
> > shrinking the SMDs).
> >
> > The response ratio does not have these same concerns because it is a
> > function of the group means alone. (The standard error of the response
> > ratio involves the SD of each group, but the effect size metric itself
> > does not.) Further, you noted that the group means are not too near
> > the extremes of the scale, so the (log-transformed) response ratio
> > should be reasonably "well-behaved" in terms of its sampling
> > distribution.
> >
> > In light of the above, here's how I might proceed if I were conducting
> > this analysis:
> > 1. Calculate *both* SMDs and log-transformed response ratios for the
> > full set of studies.
> > 2. Examine the distribution of effect size estimates for each metric
> > (using histograms or funnel plots). If one of the distributions is
> > skewed or has extreme outliers, take that as an indication that the
> > metric might not be appropriate.
> > 3. Fit meta-analytic models to summarize the distribution of effect
> > sizes in each metric, using a model that appropriately describes the
> > dependence structure of the estimates. Calculate I-squared statistics,
> > give preference to the metric with lower I-squared.
> > 4. If (2) and (3) don't lead to a clearly preferable metric, then
> > choose between SMD and RR based on whichever will make the synthesis
> > results easier to explain to people.
> > 5. (Optional/extra credit) Whichever metric you choose, repeat your
> > main analyses using the other metric and stuff all those results in
> > supplementary materials, to satisfy any inveterate statistical
> > curmudgeons who might review/read your synthesis.
> >
> > James
> >
> >
> > > On Oct 1, 2021, at 12:39 AM, Luke Martinez <martinezlukerm using gmail.com>
> wrote:
> > >
> > > Dear James,
> >
> > >
> > > Thank you for the insightful comments. Here are my answers inline:
> > >
> > >>> 1- Is the total number possible, the same for the groups being
> compared within a given study?
> > >
> > > Not necessarily.
> > >
> > >>> 2- Did some studies use passages with many possible errors to be
> corrected while other studies used passages with just a few errors?
> > >
> > > Yes, that's correct. Passage characteristics are fully coded for as
> > > potential moderators.
> > >
> > >>> 3- Did the difficulty of the passages differ from study to study?
> > >
> > > Yes, that's correct. Studies with more advanced students used more
> > > difficult passages.
> > >
> > >>> 4- Were there very low or very high mean proportions in any studies?
> > >
> > > No, means were never so close to 0 or 1.
> > >
> > >>> 5- Does there seem to be a relationship between the means and the
> variances of the proportions of a given group?
> > >
> > > Assuming you mean the following, yes:
> > >
> > > group1_M_prop = c(.39, .18, .13)
> > > group1_SD_prop = c(.25, .16, .13)
> > >
> > > plot(group1_M_prop, group1_SD_prop^2)
> > >
> > > Thanks,
> > > Luke
> > >
> > >> On Thu, Sep 30, 2021 at 10:17 PM James Pustejovsky <jepusto using gmail.com>
> wrote:
> > >>
> > >> Hi Luke,
> > >>
> > >> To add to Wolfgang's comments, I would suggest that you could also
> consider other effect measures besides the SMD. For example, the response
> ratio is also a scale-free metric that could work with the proportion
> outcomes that you've described, and would also be appropriate for raw
> frequency counts as long as the total number possible is the same for the
> groups being compared within a given study.
> > >>
> > >> Whether the response ratio would be more appropriate than the SMD is
> hard to gauge. One would need to know more about how the proportions were
> assessed and how the assessment procedures varied from study to study. For
> instance, did some studies use passages with many possible errors to be
> corrected while other studies used passages with just a few errors? Did the
> difficulty of the passages differ from study to study? Were there very low
> or very high mean proportions in any studies? Does there seem to be a
> relationship between the means and the variances of the proportions of a
> given group?
> > >>
> > >> James
> > >>
> > >>> On Thu, Sep 30, 2021 at 2:22 AM Luke Martinez <
> martinezlukerm using gmail.com> wrote:
> > >>>
> > >>> Dear Wolfgang,
> > >>>
> > >>> Thank you so much for your response and also the references.
> > >>>
> > >>> I will compute an SMD from the means and sds of all types of
> proportions
> > >>> and the raw counts reported in the papers.
> > >>>
> > >>> Instead of a moderator, I thought I add a random effect for the
> variation
> > >>> in these types of proportions and raw counts, which will be crossed
> with
> > >>> studies (I think), because true effects can be correlated (?) due to
> > >>> sharing a study as well as sharing one of these types of proportions
> or raw
> > >>> counts, right?
> > >>>
> > >>> proportion_type1 = # of corrected items / all items needing
> correction
> > >>>
> > >>> proportion_type2 = # of corrected items / (all items needing
> > >>> correction + all wrongly corrected items)
> > >>>
> > >>> raw_counts = # of corrected items
> > >>>
> > >>>
> > >>>
> > >>> On Thu, Sep 30, 2021, 1:33 AM Viechtbauer, Wolfgang (SP) <
> > >>> wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> > >>>
> > >>>> Hi Luke,
> > >>>>
> > >>>> Yes, treating the mean proportions as means is ok -- after all,
> they are
> > >>>> means. As long as n is not too small (and the true mean proportion
> not too
> > >>>> close to 0 or 1), then the CLT will also ensure that the sampling
> > >>>> distribution of a mean proportion is approximately normal.
> > >>>>
> > >>>> We have analayzed such mean proportions in these articles:
> > >>>>
> > >>>> McCurdy, M. P., Viechtbauer, W., Sklenar, A. M., Frankenstein, A.
> N., &
> > >>>> Leshikar, E. D. (2020). Theories of the generation effect and the
> impact of
> > >>>> generation constraint: A meta-analytic review. Psychonomic Bulletin
> &
> > >>>> Review, 27(6), 1139-1165.
> https://doi.org/10.3758/s13423-020-01762-3
> > >>>>
> > >>>> Vachon, H., Viechtbauer, W., Rintala, A., & Myin-Germeys, I. (2019).
> > >>>> Compliance and retention with the experience sampling method over
> the
> > >>>> continuum of severe mental disorders: Meta-analysis and
> recommendations.
> > >>>> Journal of Medical Internet Research, 21(12), e14475.
> > >>>> https://doi.org/10.2196/14475
> > >>>>
> > >>>> In these articles, we did not compute standardized mean differences
> based
> > >>>> on the mean proportions, but one could do so.
> > >>>>
> > >>>> For the data below:
> > >>>>
> > >>>> escalc(measure="SMD", m1i=0.45, m2i=0.17, sd1i=0.17, sd2i=0.11,
> n1i=20,
> > >>>> n2i=19)
> > >>>>
> > >>>> If I understand you correctly, the second type are means of counts
> (i.e.,
> > >>>> there is a count for each subject and for example 4.5 is the mean
> of those
> > >>>> counts). Again, while an individual count might have other
> distributional
> > >>>> properties (e.g., Poisson or negative binomial), once you take the
> mean,
> > >>>> it's a mean and the CLT 'kicks in'. So I would again say: yes, you
> can
> > >>>> treat these as 'regular' means and compute SMDs based on them.
> > >>>>
> > >>>> For the data below:
> > >>>>
> > >>>> escalc(measure="SMD", m1i=4.5, m2i=4.7, sd1i=1.12, sd2i=1.59,
> n1i=17,
> > >>>> n2i=18)
> > >>>>
> > >>>> I might be inclined to code a moderator that distinguishes these
> different
> > >>>> types, to see if there is some systematic difference between them.
> > >>>>
> > >>>> Best,
> > >>>> Wolfgang
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Luke Martinez [mailto:martinezlukerm using gmail.com]
> > >>>>> Sent: Thursday, 30 September, 2021 0:32
> > >>>>> To: R meta
> > >>>>> Cc: Viechtbauer, Wolfgang (SP); James Pustejovsky
> > >>>>> Subject: Re: Best choice of effect size
> > >>>>>
> > >>>>> Dear All,
> > >>>>>
> > >>>>> To further clarify, the proportion types (my previous email) are
> used
> > >>>>> to score each study participant's performance on the text. Then,
> each
> > >>>>> study reports the "mean" and "sd" of a proportion type for control
> and
> > >>>>> experimental groups (to then compare them with t-tests and ANOVAs).
> > >>>>>
> > >>>>> For example, a study using proportion_type1 (see my previous email)
> > >>>>> can provide the following for effect size calculation:
> > >>>>>
> > >>>>> Mean SD n
> > >>>>> group1 0.45 0.17 20
> > >>>>> group2 0.17 0.11 19
> > >>>>>
> > >>>>> The same is true for studies that use raw frequencies to score each
> > >>>>> study participant's performance on the text. In such studies,
> often,
> > >>>>> "mean" and "sd" of the # of corrected items (numerator of the
> > >>>>> proportions in my previous email) for control and experimental
> groups
> > >>>>> (to then compare them with t-tests and ANOVAs).
> > >>>>>
> > >>>>> For example, a study using (raw) # of corrected items can provide
> the
> > >>>>> following for effect size calculation:
> > >>>>>
> > >>>>> Mean SD n
> > >>>>> group1 4.5 1.12 17
> > >>>>> group2 4.7 1.59 18
> > >>>>>
> > >>>>> My question is that can I calculate SMD across all such studies
> given
> > >>>>> their intent is to measure the same thing?
> > >>>>>
> > >>>>> Thank you,
> > >>>>> Luke
> > >>>>>
> > >>>>> On Wed, Sep 29, 2021 at 12:12 PM Luke Martinez <
> martinezlukerm using gmail.com>
> > >>>> wrote:
> > >>>>>>
> > >>>>>> Dear All,
> > >>>>>>
> > >>>>>> I'm doing a meta-analysis where the papers report only "mean" and
> "sd"
> > >>>>>> of some form of proportion and/or "mean" and "sd" of
> corresponding raw
> > >>>>>> frequencies. (For context, the papers ask students to read, find,
> and
> > >>>>>> correct the wrong words in a text.)
> > >>>>>>
> > >>>>>> By some form of proportion, I mean, some papers report actual
> > >>>> proportions:
> > >>>>>>
> > >>>>>> proportion_type1 = # of corrected items / all items needing
> correction
> > >>>>>>
> > >>>>>> Some paper report a modified version of proportions:
> > >>>>>>
> > >>>>>> proportion_type2 = # of corrected items / (all items needing
> > >>>>>> correction + all wrongly corrected items)
> > >>>>>>
> > >>>>>> There are other versions of proportions and corresponding raw
> > >>>>>> frequencies as well. But my question is given that all these
> studies
> > >>>>>> only report "mean" and "sd", can I simply use a SMD effect size?
> > >>>>>>
> > >>>>>> Many thanks,
> > >>>>>> Luke
> > >>>>
> > >>>
> > >>> [[alternative HTML version deleted]]
> > >>>
> > >>> _______________________________________________
> > >>> R-sig-meta-analysis mailing list
> > >>> R-sig-meta-analysis using r-project.org
> > >>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>
[[alternative HTML version deleted]]
More information about the R-sig-meta-analysis
mailing list