[R-meta] Effect sizes for mixed-effects models

James Pustejovsky jepu@to @end|ng |rom gm@||@com
Mon Dec 16 23:33:02 CET 2019

Hi Lena,

To your first question: the distinction between Brysbaert and Stevens
(2018) and Hedges (2007) has to do with estimation, rather than the
definition of the effect size. Both studies use the same definition of the
effect size parameter (assuming standardization by the total variance).
Brysbaert and Stevens assume that you are working with the results of a
fitted mixed effects model, where the variance components would be
estimated using restricted maximum likelihood (REML). In contrast, Hedges
(2007) uses moment estimators assuming a balanced design. In his notation,
S_B^2 and S_W^2 are sample variances between and within-clusters,
respectively, which are not exactly the same as the REML estimators. The (n
- 1) / n term arises because S_B^2 is an overestimate of sigma_B^2 (the
between-cluster population variance). See the explanation on p. 347 in the
section "Estimation of delta_B". In a balanced design (where all clusters
are the same size), the two approaches to calculation should yield
identical estimates of total variance, I think, and even with some
imbalance the total variance estimates (and resulting effect size
estimates) should come very close.

To your second question about how to get the degrees of freedom, yes I
think using the total number of participants is probably a good and
conservative approximation.

To your final question about comparability across between- and
within-subjects designs: comparability hinges on whether the variance
components used in the denominator of d are the same across both types of
designs. In principle, using the methods outlined in my blog post, you
should be able to define and estimate effect sizes that are comparable
across both types of designs. Of course, in practice there may be factors
that differ across the two types of designs.  For example, how the
treatment is operationalized in a within-subjects design might be different
from how it is typically operationalized in a between-subjects design. Or
the scales used to assess the outcome might differ between the two types of
designs. Thus, I would recommend approaching this issue both conceptually
and empirically. Conceptually, try to obtain effect size estimates that are
comparable in principle. Then empirically, examine whether effect sizes
differ on average according to the type of design.


On Fri, Dec 13, 2019 at 8:00 AM Lena Schäfer <lenaschaefer2304 using gmail.com>

> Dear James,
> Thank you so much for the detailed response! I apologize for the delay in
> getting back to you; my graduate school applications got in the way of
> this. Your suggestion is exactly what we have been looking for and your
> blogpost has been very informative. I do have a couple of follow-up
> questions and would be curious to hear what you think:
>  *Calculating Cohen’s d and its variance for mixed-effects models*
> Initially, we planned to follow Brysbaert and Stevens' (2018) suggestion
> to calculate Cohen’s d for mixed-effects models using:
> d = difference in means / sqrt(sum of all variance components).
> Hedges (2007) proposes three approaches toward scaling the treatment
> effect in mixed-effects models, namely by standardizing the mean difference
> by the total variance (i.e., sum of the within- and between-cluster
> components), the within-cluster variance, or the between-cluster variance.
> Intuitively, I understood that Brysbaert and Stevens’ approach also uses
> the total variance to scale the treatment effect since **all** variance
> components are summed up. However, Hedges seems to use another formula
> for deriving dTotal, namely:
> dT = difference in means / sqrt (between-cluster components + ((n-1) / n)
> * within-cluster components).
> Can you help me understand in which cases it would make sense to scale the
> difference in means by sqrt(sum of all variance components) and in which
> cases it would be more reasonable to use sqrt (between-cluster components +
> ((n-1) / n) * within-cluster components)?
> You also provided information on an alternative approach
> towards calculating the variance of Cohen’s d using:
> Vd = (SEb / S)^2 + d^2 / (2 v)
> For our mixed-effects models, I could derive SEb directly from the lme4
> output, and I could substitute the standardizer used for calculating Cohen’s
>  d for S (sqrt(sum of all variance components) or sqrt (between-cluster
> components + ((n-1) / n) * within-cluster components). In an effort to be
> as conservative as possible, I would use the number of participants as the
> degrees of freedom (v). Does this make sense?
> *Comparability of effect sizes derived from between- and within-subjects
> designs*
> Finally, I wonder to which extent the alternative formulas suggested in
> the blogpost allow for comparison across different experimental designs. In
> our meta-analysis, we aim at including effect sizes derived from between-
> and within-subjects designs. To be able to synthesize the results from both
> types of designs in one analysis, we make sure to meet the three criteria
> outlined in Morris and DeShon (2002): 1) all effect sizes were ultimately
> transformed into a common metric (between-subjects metric); 2) the same
> effect of interest was measured in both types of studies; and 3) sampling
> variances for all effect sizes were estimated based on the original  design
> of the study (Table 2). Comparing the variance formulas provided in the
> blogpost to the ones provided in Morris & DeShon, it seems like the latter
> ones are slightly larger (and thus more conservative, which seems fine).
> However, I am uncertain about mixing the Morris & DeShon formulas for
> within- and between-subjects designs (to allow for comparison) with
> the alternative formulas you provided for calculating Cohen’s d and its
> respective variance for mixed-effects models. Do you think this might cause
> any problems for the comparability of our effect sizes? I wonder whether
> you have some intuition on whether effect sizes derived using the
> alternative formulas proposed in the blogpost can be across different study
> designs.
> Thank you so much for your help. Your time and effort are very much
> appreciated!
> Best wishes,
> Lena Schaefer
> On behalf of a collaborative team that additionally includes Leah
> Somerville (head of the Affective Neuroscience and Development Laboratory),
> Katherine Powers (former postdoc in the Affective Neuroscience and
> Development Laboratory) and Bernd Figner (Radboud University).
> Am 22.10.2019 um 05:12 schrieb James Pustejovsky <jepusto using gmail.com>:
> Lena,
> The formula you tried from Hedges 2007 is derived under the assumption
> that treatment assignment is at the cluster level, so I don't think it will
> work for your mixed design. The following post might be useful to answer
> your questions:
> https://www.jepusto.com/alternative-formulas-for-the-smd/
> In it, I suggest a quite general approach to estimating the variance of a
> standardized mean difference effect size, even if it is based on a complex
> experimental design. Suppose that you calculate the SMD estimate as
> d = b / S,
> where b is the unstandardized mean difference (which in your design
> involves a combination of within- and between-Ss comparisons) and S is the
> standard deviation of the outcome, which generally might involve a sum of
> multiple variance components. A delta-method approximation to the variance
> of d is
> Vd = (SEb / S)^2 + d^2 / (2 v),
> where SEb is the standard error of b, S is the denominator of the effect
> size estimate, d is the effect size estimate, and v is the degrees of
> freedom of S^2, defined by v = 2[ E(S^2)]^2 / Var(S^2). The SEb should
> usually be reported in primary studies (or can be back-calculated from t
> statistics or CIs). Thus, the only tricky bit is to find the degrees of
> freedom for the standardizing variance S^2. You might need to just make a
> rough approximation, based on for instance the total number of
> participants. Using a rough approximation (e.g., v = 30) should not have
> much effect on the total estimated variance Vd unless d is very large, so
> personally I would not worry too much about getting it perfect.
> As I explain in the post, you can also use the degrees of freedom v to do
> Hedges' g correction, taking
> g = J(v) * d,
> where J(v) = 1 - 3 / (4 * v - 1). Again, it's not worth worrying about
> getting the degrees of freedom perfect. Consider that J(30) = 0.9748 and
> J(60) = 0.9874, so the g estimate will differ by only a tiny amount
> depending on the degrees of freedom you use.
> James
> On Sat, Oct 19, 2019 at 2:41 PM Lena Schäfer <lenaschaefer2304 using gmail.com>
> wrote:
>> Hi everyone,
>> I am writing to ask two questions related to the calculation of effect
>> sizes for mixed-effects models for a meta-analysis.
>> To derive effect sizes for mixed-effects models, we generally follow the
>> Hedges 2007 paper (
>> https://journals.sagepub.com/doi/abs/10.3102/1076998606298043?journalCode=jebb
>>  <
>> https://journals.sagepub.com/doi/abs/10.3102/1076998606298043?journalCode=jebb>)
>> and a blogpost by Jake Westfall on effect-size calculations for
>> within-subjects designs (
>> http://jakewestfall.org/blog/index.php/2016/03/25/five-different-cohens-d-statistics-for-within-subject-designs/
>> <
>> http://jakewestfall.org/blog/index.php/2016/03/25/five-different-cohens-d-statistics-for-within-subject-designs/
>> >):
>> 1.     Variance for complex mixed-effects models
>> While the calculation of Cohen’s d is unproblematic (formula 8 on page
>> 346 in Hedges, 2007), the calculation of the respective variance turned out
>> to be difficult for complex study designs. Hedge’s provided the following
>> formula () to derive V(dw):
>> V(dw) = ((NT + NC) / (NT * NC)) * ((1+(n-1)p)/(1-p)) + ((dw^2) / (2(N –
>> M)))
>> with NT referring to the number of observations in the treatment group,
>> NC referring to the number of observations in the control group, N
>> referring to the total number of observations (NT + NC  = N), n referring
>> to the number of observations per cluster, p referring to the ICC, and M
>> referring to the number of clusters.
>> For our meta-analysis, we want to derive the variance related to Cohen’s
>> d for a mixed-subjects design with some participant conducting a task only
>> in the control condition and other participants conducting the task in the
>> control and in the experimental condition (within-subjects design). Since
>> the number of observations per cluster differs (some participants have 30
>> observations, others have 60) we decided to use the variance formula for
>> unequal cluster sample sizes in which n is substituted with the cluster
>> sample size ñ (formula 18 on page 350):
>> ñ = ((NC * ΣmTi = 1 (nTi)^2) / (NT * N)) +  ((NT * ΣmCi = 1 (nCi)^2) /
>> (NC * N))
>> iWhile we expected that this formula would yield an unequal cluster
>> sample size between 30 and 60, it gives us a value of 30 (which is equal to
>> the cluster sample size if this would be a between-subjects design). This
>> suggests that the formula cannot account for the participants which are
>> both in the control and the experimental condition. Do you have any advice
>> on how we could derive an accurate variance estimate for such a design?
>> 2. Turning Cohen’s d into Hedge’s g for mixed-models
>> Finally, we want to transform Cohen’s d into Hedge’s g using:
>> g(d) = d * (1- ((3) / (4 * df - 1))
>> We are uncertain how to best estimate the dfs in our mixed-models. We
>> considered using Kenward-Roger approximated dfs but this does not seem
>> feasible since we only have access to parts of the raw data-sets used to
>> derive dw and V{dw}. Potentially, another option would be to estimate the
>> dfs via the effective sample size. This seems more feasible since the
>> authors of primary papers provided us with the ICC related to each model.
>> What do you think about this option?
>> If you have any thoughts on this, we would greatly appreciate it if you
>> could let us know what you think. Thank you for taking the time to consider
>> our request, and please don’t hesitate to reach out if anything is unclear.
>> Thank you very much and best regards,
>> Lena Schäfer
>> On behalf of a collaborative team that additionally includes Leah
>> Somerville (head of the Affective Neuroscience and Development Laboratory),
>> Katherine Powers (former postdoc in the Affective Neuroscience and
>> Development Laboratory) and Bernd Figner (Radboud University).
>>         [[alternative HTML version deleted]]
>> _______________________________________________
>> R-sig-meta-analysis mailing list
>> R-sig-meta-analysis using r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis

	[[alternative HTML version deleted]]

More information about the R-sig-meta-analysis mailing list