[R-meta] Best choice of effect size
Luke Martinez
m@rt|nez|ukerm @end|ng |rom gm@||@com
Sun Oct 17 08:20:58 CEST 2021
Dear James,
I now had the opportunity to follow your recommendations in this post.
I calculated the Log Response Ratio (LRR), SMD, and SMDH.
You were right, LRR looks way more symmetric compared to SMD and SMDH
(plots attached). The implications were serious. Egger's test under
SMD, and SMDH was highly significant. But under LRR it's not. Outliers
under SMD, and SMDH were really extreme, but under LRR, they are not.
This makes me want to understand the nature of these benefits better.
First, "technically", when is the SMD family the first and the best
choice of effect size in a meta-analysis?
Second, I understand that the use of SMD in my case (I dealt with
proportions(-ish) and raw frequency counts of errors made by subjects
on a test with varying number of errors and time limits across
different studies) was affected by varying test reliability estimates
introducing heterogeneity in my SMDs. But does that mean if a test has
had a **high reliability estimate**, then, the SMD could be large due
to the smaller denominator of the SMD and in turn such a phenomenon
could lead to the rise of positive outliers? (a part of me says, if an
effect size estimate is larger due to the higher reliability of the
measurement, so be it, it's a legitimate thing)
Third, the mean of proportions (Mu_p) and their variances (Var_p) do
have a natural positive relation between them. For each student in a
given group attempting "n" questions, we have Mu_p = n*p and Var_p =
(1-p)*Mu_p, thus p = 1 - (Var_p / Mu_p). So, for a fixed unknown p, a
smaller Mu_p leads to a smaller Var_p for each student by design.
Then, for a given group of students **over time**, if we see
improvement (recall improvement means making less errors) on an
outcome, then we expect the mean of that whole group's proportions to
get smaller, like:
group1_M_prop = c(.39, .18, .13)
and by design we expect the Sd of that whole group's proportions to
get smaller over time as well, like:
group1_SD_prop = c(.25, .16, .13)
So, I wonder why we even need to investigate the relation between Mean
proportion and SD of proportion in each group when this happens
naturally?
Sincerely,
Luke
On Sun, Oct 3, 2021 at 7:51 PM Luke Martinez <martinezlukerm using gmail.com> wrote:
>
> Many thanks. Then, I can use "SMDH" instead of "SMD", to be on the safe side.
>
> Thanks, again,
> Luke
>
> On Sun, Oct 3, 2021 at 7:43 PM James Pustejovsky <jepusto using gmail.com> wrote:
> >
> > I am not entirely sure. But in any case, if there is a mean-variance relationship at the individual level, then any difference in means would imply a differential effect on the variances of the two groups.
> >
> > > On Oct 3, 2021, at 7:18 PM, Luke Martinez <martinezlukerm using gmail.com> wrote:
> > >
> > > Sure, my understanding is that if the relationship between Means and
> > > SDs equally affect M_c and M_t, no major issues arise. But if the two
> > > groups are differentially affected by that relationship, then that can
> > > bias the SMD up or down, no?
> > >
> > >> On Sun, Oct 3, 2021 at 7:09 PM James Pustejovsky <jepusto using gmail.com> wrote:
> > >>
> > >> Hi Luke,
> > >> Responses inline below.
> > >> James
> > >>
> > >>> On Sun, Oct 3, 2021 at 3:16 PM Luke Martinez <martinezlukerm using gmail.com> wrote:
> > >>>
> > >>> Dear James,
> > >>>
> > >>> Thank you for the thorough and thought-provoking response. Here are my
> > >>> two takeaways:
> > >>>
> > >>> 1- Your insightful advice seems to be a general criticism of SMDs in
> > >>> general due to the use of some form of SD in the denominator and not
> > >>> just when dealing with my situation (i.e., studies reporting M, and SD
> > >>> of proportions and/or counts), right?
> > >>>
> > >> I would not go quite that far. The concerns I raised with the SMD are more salient when dealing with outcomes that are proportions or counts.
> > >>
> > >>>
> > >>> 2- When using SMDs, one has to keep an open eye regarding reliability
> > >>> estimates, and factors affecting them (e.g., time provided for the
> > >>> test) in the studies and possibly control for them in the analysis,
> > >>> right?
> > >>>
> > >> Yes. Although, I would add that using an effect measure that is invariant (or at least relatively robust) to such factors is preferable to trying to account for the factors using meta-regression.
> > >>
> > >>>
> > >>> I also wanted to clarify two things:
> > >>>
> > >>> First, by log-transformed response ratio, you mean "ROM" or "ROMC" as
> > >>> represented in metafor::escalc?
> > >>>
> > >> Yes.
> > >>
> > >>> Second, by reference group, you simply mean the mean for each
> > >>> treatment group as denoted by M_t in (M_t - M_c / Pooled_SD)?
> > >>>
> > >> I had in mind the control groups (M_c), although my comment would apply equally to the treatment groups.
> > >>
> > >>>
> > >>> Respectfully,
> > >>> Luke
> > >>>
> > >>> On Sun, Oct 3, 2021 at 11:31 AM James Pustejovsky <jepusto using gmail.com> wrote:
> > >>>>
> > >>>> Hi Luke,
> > >>>>
> > >>>> Based on your responses, I think the response ratio could be an
> > >>>> appropriate effect measure and further that there could be drawbacks
> > >>>> to using the standardized mean difference. Let me note potential
> > >>>> drawbacks first.
> > >>>>
> > >>>> * Variation in the number of possible errors (and perhaps also in the
> > >>>> length of the time provided for the test?) suggests that the measures
> > >>>> from different studies may have varying degrees of reliability.
> > >>>> Varying reliability introduces heterogeneity in the SMD (because the
> > >>>> denominator is inflated or shrunk by the degree of reliability).
> > >>>>
> > >>>> * A relationship between the M and SD of the proportions for a given
> > >>>> group suggests that the distribution of the individual-level outcomes
> > >>>> might also exhibit mean-variance relationships. (I say "suggests"
> > >>>> rather than implies because there's an ecological inference here,
> > >>>> i.e., assuming something about individual-level variation on the basis
> > >>>> of group-level variation). If this supposition is reasonable, then
> > >>>> that introduces a further potential source of heterogeneity in the
> > >>>> SMDs (study-to-study variation in the M for the reference group
> > >>>> influences the SD of the reference group, thereby inflating or
> > >>>> shrinking the SMDs).
> > >>>>
> > >>>> The response ratio does not have these same concerns because it is a
> > >>>> function of the group means alone. (The standard error of the response
> > >>>> ratio involves the SD of each group, but the effect size metric itself
> > >>>> does not.) Further, you noted that the group means are not too near
> > >>>> the extremes of the scale, so the (log-transformed) response ratio
> > >>>> should be reasonably "well-behaved" in terms of its sampling
> > >>>> distribution.
> > >>>>
> > >>>> In light of the above, here's how I might proceed if I were conducting
> > >>>> this analysis:
> > >>>> 1. Calculate *both* SMDs and log-transformed response ratios for the
> > >>>> full set of studies.
> > >>>> 2. Examine the distribution of effect size estimates for each metric
> > >>>> (using histograms or funnel plots). If one of the distributions is
> > >>>> skewed or has extreme outliers, take that as an indication that the
> > >>>> metric might not be appropriate.
> > >>>> 3. Fit meta-analytic models to summarize the distribution of effect
> > >>>> sizes in each metric, using a model that appropriately describes the
> > >>>> dependence structure of the estimates. Calculate I-squared statistics,
> > >>>> give preference to the metric with lower I-squared.
> > >>>> 4. If (2) and (3) don't lead to a clearly preferable metric, then
> > >>>> choose between SMD and RR based on whichever will make the synthesis
> > >>>> results easier to explain to people.
> > >>>> 5. (Optional/extra credit) Whichever metric you choose, repeat your
> > >>>> main analyses using the other metric and stuff all those results in
> > >>>> supplementary materials, to satisfy any inveterate statistical
> > >>>> curmudgeons who might review/read your synthesis.
> > >>>>
> > >>>> James
> > >>>>
> > >>>>
> > >>>>> On Oct 1, 2021, at 12:39 AM, Luke Martinez <martinezlukerm using gmail.com> wrote:
> > >>>>>
> > >>>>> Dear James,
> > >>>>
> > >>>>>
> > >>>>> Thank you for the insightful comments. Here are my answers inline:
> > >>>>>
> > >>>>>>> 1- Is the total number possible, the same for the groups being compared within a given study?
> > >>>>>
> > >>>>> Not necessarily.
> > >>>>>
> > >>>>>>> 2- Did some studies use passages with many possible errors to be corrected while other studies used passages with just a few errors?
> > >>>>>
> > >>>>> Yes, that's correct. Passage characteristics are fully coded for as
> > >>>>> potential moderators.
> > >>>>>
> > >>>>>>> 3- Did the difficulty of the passages differ from study to study?
> > >>>>>
> > >>>>> Yes, that's correct. Studies with more advanced students used more
> > >>>>> difficult passages.
> > >>>>>
> > >>>>>>> 4- Were there very low or very high mean proportions in any studies?
> > >>>>>
> > >>>>> No, means were never so close to 0 or 1.
> > >>>>>
> > >>>>>>> 5- Does there seem to be a relationship between the means and the variances of the proportions of a given group?
> > >>>>>
> > >>>>> Assuming you mean the following, yes:
> > >>>>>
> > >>>>> group1_M_prop = c(.39, .18, .13)
> > >>>>> group1_SD_prop = c(.25, .16, .13)
> > >>>>>
> > >>>>> plot(group1_M_prop, group1_SD_prop^2)
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Luke
> > >>>>>
> > >>>>>> On Thu, Sep 30, 2021 at 10:17 PM James Pustejovsky <jepusto using gmail.com> wrote:
> > >>>>>>
> > >>>>>> Hi Luke,
> > >>>>>>
> > >>>>>> To add to Wolfgang's comments, I would suggest that you could also consider other effect measures besides the SMD. For example, the response ratio is also a scale-free metric that could work with the proportion outcomes that you've described, and would also be appropriate for raw frequency counts as long as the total number possible is the same for the groups being compared within a given study.
> > >>>>>>
> > >>>>>> Whether the response ratio would be more appropriate than the SMD is hard to gauge. One would need to know more about how the proportions were assessed and how the assessment procedures varied from study to study. For instance, did some studies use passages with many possible errors to be corrected while other studies used passages with just a few errors? Did the difficulty of the passages differ from study to study? Were there very low or very high mean proportions in any studies? Does there seem to be a relationship between the means and the variances of the proportions of a given group?
> > >>>>>>
> > >>>>>> James
> > >>>>>>
> > >>>>>>> On Thu, Sep 30, 2021 at 2:22 AM Luke Martinez <martinezlukerm using gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> Dear Wolfgang,
> > >>>>>>>
> > >>>>>>> Thank you so much for your response and also the references.
> > >>>>>>>
> > >>>>>>> I will compute an SMD from the means and sds of all types of proportions
> > >>>>>>> and the raw counts reported in the papers.
> > >>>>>>>
> > >>>>>>> Instead of a moderator, I thought I add a random effect for the variation
> > >>>>>>> in these types of proportions and raw counts, which will be crossed with
> > >>>>>>> studies (I think), because true effects can be correlated (?) due to
> > >>>>>>> sharing a study as well as sharing one of these types of proportions or raw
> > >>>>>>> counts, right?
> > >>>>>>>
> > >>>>>>> proportion_type1 = # of corrected items / all items needing correction
> > >>>>>>>
> > >>>>>>> proportion_type2 = # of corrected items / (all items needing
> > >>>>>>> correction + all wrongly corrected items)
> > >>>>>>>
> > >>>>>>> raw_counts = # of corrected items
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Thu, Sep 30, 2021, 1:33 AM Viechtbauer, Wolfgang (SP) <
> > >>>>>>> wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> > >>>>>>>
> > >>>>>>>> Hi Luke,
> > >>>>>>>>
> > >>>>>>>> Yes, treating the mean proportions as means is ok -- after all, they are
> > >>>>>>>> means. As long as n is not too small (and the true mean proportion not too
> > >>>>>>>> close to 0 or 1), then the CLT will also ensure that the sampling
> > >>>>>>>> distribution of a mean proportion is approximately normal.
> > >>>>>>>>
> > >>>>>>>> We have analayzed such mean proportions in these articles:
> > >>>>>>>>
> > >>>>>>>> McCurdy, M. P., Viechtbauer, W., Sklenar, A. M., Frankenstein, A. N., &
> > >>>>>>>> Leshikar, E. D. (2020). Theories of the generation effect and the impact of
> > >>>>>>>> generation constraint: A meta-analytic review. Psychonomic Bulletin &
> > >>>>>>>> Review, 27(6), 1139-1165. https://doi.org/10.3758/s13423-020-01762-3
> > >>>>>>>>
> > >>>>>>>> Vachon, H., Viechtbauer, W., Rintala, A., & Myin-Germeys, I. (2019).
> > >>>>>>>> Compliance and retention with the experience sampling method over the
> > >>>>>>>> continuum of severe mental disorders: Meta-analysis and recommendations.
> > >>>>>>>> Journal of Medical Internet Research, 21(12), e14475.
> > >>>>>>>> https://doi.org/10.2196/14475
> > >>>>>>>>
> > >>>>>>>> In these articles, we did not compute standardized mean differences based
> > >>>>>>>> on the mean proportions, but one could do so.
> > >>>>>>>>
> > >>>>>>>> For the data below:
> > >>>>>>>>
> > >>>>>>>> escalc(measure="SMD", m1i=0.45, m2i=0.17, sd1i=0.17, sd2i=0.11, n1i=20,
> > >>>>>>>> n2i=19)
> > >>>>>>>>
> > >>>>>>>> If I understand you correctly, the second type are means of counts (i.e.,
> > >>>>>>>> there is a count for each subject and for example 4.5 is the mean of those
> > >>>>>>>> counts). Again, while an individual count might have other distributional
> > >>>>>>>> properties (e.g., Poisson or negative binomial), once you take the mean,
> > >>>>>>>> it's a mean and the CLT 'kicks in'. So I would again say: yes, you can
> > >>>>>>>> treat these as 'regular' means and compute SMDs based on them.
> > >>>>>>>>
> > >>>>>>>> For the data below:
> > >>>>>>>>
> > >>>>>>>> escalc(measure="SMD", m1i=4.5, m2i=4.7, sd1i=1.12, sd2i=1.59, n1i=17,
> > >>>>>>>> n2i=18)
> > >>>>>>>>
> > >>>>>>>> I might be inclined to code a moderator that distinguishes these different
> > >>>>>>>> types, to see if there is some systematic difference between them.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Wolfgang
> > >>>>>>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>> From: Luke Martinez [mailto:martinezlukerm using gmail.com]
> > >>>>>>>>> Sent: Thursday, 30 September, 2021 0:32
> > >>>>>>>>> To: R meta
> > >>>>>>>>> Cc: Viechtbauer, Wolfgang (SP); James Pustejovsky
> > >>>>>>>>> Subject: Re: Best choice of effect size
> > >>>>>>>>>
> > >>>>>>>>> Dear All,
> > >>>>>>>>>
> > >>>>>>>>> To further clarify, the proportion types (my previous email) are used
> > >>>>>>>>> to score each study participant's performance on the text. Then, each
> > >>>>>>>>> study reports the "mean" and "sd" of a proportion type for control and
> > >>>>>>>>> experimental groups (to then compare them with t-tests and ANOVAs).
> > >>>>>>>>>
> > >>>>>>>>> For example, a study using proportion_type1 (see my previous email)
> > >>>>>>>>> can provide the following for effect size calculation:
> > >>>>>>>>>
> > >>>>>>>>> Mean SD n
> > >>>>>>>>> group1 0.45 0.17 20
> > >>>>>>>>> group2 0.17 0.11 19
> > >>>>>>>>>
> > >>>>>>>>> The same is true for studies that use raw frequencies to score each
> > >>>>>>>>> study participant's performance on the text. In such studies, often,
> > >>>>>>>>> "mean" and "sd" of the # of corrected items (numerator of the
> > >>>>>>>>> proportions in my previous email) for control and experimental groups
> > >>>>>>>>> (to then compare them with t-tests and ANOVAs).
> > >>>>>>>>>
> > >>>>>>>>> For example, a study using (raw) # of corrected items can provide the
> > >>>>>>>>> following for effect size calculation:
> > >>>>>>>>>
> > >>>>>>>>> Mean SD n
> > >>>>>>>>> group1 4.5 1.12 17
> > >>>>>>>>> group2 4.7 1.59 18
> > >>>>>>>>>
> > >>>>>>>>> My question is that can I calculate SMD across all such studies given
> > >>>>>>>>> their intent is to measure the same thing?
> > >>>>>>>>>
> > >>>>>>>>> Thank you,
> > >>>>>>>>> Luke
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Sep 29, 2021 at 12:12 PM Luke Martinez <martinezlukerm using gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Dear All,
> > >>>>>>>>>>
> > >>>>>>>>>> I'm doing a meta-analysis where the papers report only "mean" and "sd"
> > >>>>>>>>>> of some form of proportion and/or "mean" and "sd" of corresponding raw
> > >>>>>>>>>> frequencies. (For context, the papers ask students to read, find, and
> > >>>>>>>>>> correct the wrong words in a text.)
> > >>>>>>>>>>
> > >>>>>>>>>> By some form of proportion, I mean, some papers report actual
> > >>>>>>>> proportions:
> > >>>>>>>>>>
> > >>>>>>>>>> proportion_type1 = # of corrected items / all items needing correction
> > >>>>>>>>>>
> > >>>>>>>>>> Some paper report a modified version of proportions:
> > >>>>>>>>>>
> > >>>>>>>>>> proportion_type2 = # of corrected items / (all items needing
> > >>>>>>>>>> correction + all wrongly corrected items)
> > >>>>>>>>>>
> > >>>>>>>>>> There are other versions of proportions and corresponding raw
> > >>>>>>>>>> frequencies as well. But my question is given that all these studies
> > >>>>>>>>>> only report "mean" and "sd", can I simply use a SMD effect size?
> > >>>>>>>>>>
> > >>>>>>>>>> Many thanks,
> > >>>>>>>>>> Luke
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> [[alternative HTML version deleted]]
> > >>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>> R-sig-meta-analysis mailing list
> > >>>>>>> R-sig-meta-analysis using r-project.org
> > >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LRR.PNG
Type: image/png
Size: 34656 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20211017/0958091a/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SMDH.PNG
Type: image/png
Size: 27848 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20211017/0958091a/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SMD.PNG
Type: image/png
Size: 28952 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20211017/0958091a/attachment-0005.png>
More information about the R-sig-meta-analysis
mailing list