[R-meta] Correcting Hedges' g vs. Log response ratio in nested studies

James Pustejovsky jepu@to @end|ng |rom gm@||@com
Fri Nov 3 04:06:37 CET 2023


No. The notation in the WWC handbook is poorly chosen and confusing, so let
me offer a slightly different notation to try and clear things up. This
correction applies to a study with m clusters, where cluster j has n_j
individuals. There are a total of N = sum(n_1,...,n_m) individuals, of
which N_t are assigned to the treatment condition and N_c are assigned to
the control condition. The average cluster size is then n_bar = N / m. With
this notation, the correction factor is

sqrt(1 - (2 * (n_bar - 1) * icc / (N - 2))) = sqrt(1 - (2 * (n_bar - 1) *
icc / (n_bar * m - 2)))

Thus, for a given study, you need to know the icc, the value of n_bar, and
the value of N (or the number of clusters m and the total sample size N,
from which you can calculate n_bar) in order to calculate the correction
factor.

On Thu, Nov 2, 2023 at 7:32 PM Yuhang Hu <yh342 using nau.edu> wrote:

> Thanks so  much, James. This is how I got it set up. Do I have it right?
>
> dat <- read.table(header=TRUE,text="
> study  gi   vi  cluster n1  n2
> 1     .2   .05  T       25  23
> 1     .3   .08  T       18  11
> 2     1    .1   F       19  21
> 2     2    .2   F       12  36")
>
> g_cluster <- \(gi, n1, n2, icc=.15){
>
>   n <- mean(c(n1,n2), na.rm=TRUE)
>   N <- sum(c(n1,n2), na.rm=TRUE)
>   gi*sqrt( 1-((2*(n-1)*icc)/(N-2)) )
> }
> library(dplyr)
>   group_by(dat , study) %>%
>   mutate(gi= ifelse(cluster, g_cluster(gi,n1,n2),  gi))
>
> On Thu, Nov 2, 2023 at 3:22 PM James Pustejovsky <jepusto using gmail.com>
> wrote:
>
>> This correction applies to a single effect size estimate from a given
>> study. All of these values (n, N, n1, n2) are therefore specific to the
>> study and need to be recorded for each row in a meta-analysis database.
>>
>> On Nov 2, 2023, at 5:14 PM, Yuhang Hu <yh342 using nau.edu> wrote:
>>
>> 
>> Sure, I thought N is n1 + n2  which is unique to each row in the
>> dataset.
>>
>> But it looks like I should compute N as n * m where "n" is (average of
>> n1, n2) for each row in the data but "m" is constant across all rows in the
>> dataset.
>>
>> Thanks,
>> Yuhang
>>
>> On Thu, Nov 2, 2023 at 2:16 PM James Pustejovsky <jepusto using gmail.com>
>> wrote:
>>
>>> Total sample size is the same thing as the average sample size per
>>> cluster times the number of clusters. My previous message is just a
>>> restatement of the formula to show how it is related to the number of
>>> clusters.
>>>
>>> On Thu, Nov 2, 2023 at 4:11 PM Yuhang Hu <yh342 using nau.edu> wrote:
>>>
>>>> Hi James,
>>>>
>>>> If you look at Eq. number E.5.1 on p1 of this document: (
>>>> https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-41-Supplement-508_09212020.pdf)
>>>> they define the correction factor as: sqrt( 1-((2*(n-1)*icc)/(N-2)) )
>>>> where N is n1 + n2 (total sample size), and n as s the average number
>>>> of individuals per cluster.
>>>>
>>>> Am I missing something? Or is the correction factor linked above from
>>>> WWC inaccurate?
>>>>
>>>> Thank you,
>>>> Yuhang
>>>>
>>>> On Thu, Nov 2, 2023 at 1:51 PM James Pustejovsky <jepusto using gmail.com>
>>>> wrote:
>>>>
>>>>> Responses inline below.
>>>>>
>>>>> On Thu, Nov 2, 2023 at 3:30 PM Yuhang Hu <yh342 using nau.edu> wrote:
>>>>>
>>>>>> Regarding your first message, it looks like the correction factor for
>>>>>> SMD is: sqrt( 1-((2*(n-1)*icc)/(N-2)) ) where n is the average cluster size
>>>>>> for each comparison in a study, and N is the sum of the two groups' sample
>>>>>> sizes. So, I wonder how the number of clusters is impacting the correction
>>>>>> factor for SMD as you indicated?
>>>>>>
>>>>>> N = n * m, where m is the number of clusters. So the correction
>>>>> factor is
>>>>> sqrt( 1-((2*(n-1)*icc)/(m * n - 2)) ~=  sqrt( 1- 2 * icc /m)
>>>>>
>>>>>
>>>>>> Regarding my initial question, my hunch was that for SMD, the SMD
>>>>>> estimate and its sampling variance are (non-linearly) related to one
>>>>>> another. Therefore, correcting the sampling variance for a design issue
>>>>>> will necessitate correcting the SDM estimate as well.
>>>>>>
>>>>>> On the other hand, the LRR estimate and its sampling variance are not
>>>>>> as much related to one another. Therefore, correcting the sampling variance
>>>>>> for a design issue will not necessitate correcting the LRR estimate as well.
>>>>>>
>>>>>>
>>>>> No, the issue you've described here is pretty much unrelated to the
>>>>> bias correction problem.
>>>>>
>>>>>
>>>>>> On Thu, Nov 2, 2023 at 8:41 AM James Pustejovsky <jepusto using gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> One other thought on this question, for the extra-nerdy.
>>>>>>>
>>>>>>> The formulas for the Hedges' g SMD estimator involve what
>>>>>>> statisticians would call "second-order" bias corrections, meaning
>>>>>>> corrections arising from having a limited sample size. In contrast, the
>>>>>>> usual estimator of the LRR is just a "plug-in" estimator that works for
>>>>>>> large sample sizes but can have small biases with limited sample sizes.
>>>>>>> Lajeunesse (2015; https://doi.org/10.1890/14-2402.1) provides
>>>>>>> formulas for the second-order bias correction of the LRR estimator with
>>>>>>> independent samples. These bias correction formulas actually *would* need
>>>>>>> to be different if you have clustered observations. So, the two effect size
>>>>>>> metrics are maybe more similar than it initially seemed:
>>>>>>> - Both metrics have plug-in estimators that are not really affected
>>>>>>> by the dependence structure of the sample, but whose variance estimators do
>>>>>>> need to take into account the dependence structure
>>>>>>> - Both metrics have second-order corrected estimators, the exact
>>>>>>> form for which does need to take into account the dependence structure.
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>> On Thu, Nov 2, 2023 at 8:14 AM James Pustejovsky <jepusto using gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Wolfgang is correct. The WWC correction factor arises because the
>>>>>>>> sample variance is not quite unbiased as an estimator for the total
>>>>>>>> population variance in a design with clusters of dependent observations,
>>>>>>>> which leads to a small bias in the SMD.
>>>>>>>>
>>>>>>>> The thing is, though, this correction factor is usually negligible.
>>>>>>>> Say you’ve got a clustered design with n = 21 kids per cluster and 20
>>>>>>>> clusters, and an ICC of 0.2. Then the correction factor is going to be
>>>>>>>> about 0.99 and so will make very little difference for the effect size
>>>>>>>> estimate. It only starts to matter if you’re looking at studies with very
>>>>>>>> few clusters and non-trivial ICCs.
>>>>>>>>
>>>>>>>> James
>>>>>>>>
>>>>>>>> > On Nov 2, 2023, at 3:04 AM, Viechtbauer, Wolfgang (NP) via
>>>>>>>> R-sig-meta-analysis <r-sig-meta-analysis using r-project.org> wrote:
>>>>>>>> > Dear Yuhang,
>>>>>>>> >
>>>>>>>> > I haven't looked deeply into this, but an immediate thought I
>>>>>>>> have is that for SMDs, you divide by some measure of variability within the
>>>>>>>> groups. If that measure of variability is affected by your study design,
>>>>>>>> then this will also affect the SMD value. On the other hand, this doesn't
>>>>>>>> have any impact on LRRs since they are only the (log-transformed) ratio of
>>>>>>>> the means.
>>>>>>>> >
>>>>>>>> > Best,
>>>>>>>> > Wolfgang
>>>>>>>> >
>>>>>>>> >> -----Original Message-----
>>>>>>>> >> From: R-sig-meta-analysis <
>>>>>>>> r-sig-meta-analysis-bounces using r-project.org> On Behalf
>>>>>>>> >> Of Yuhang Hu via R-sig-meta-analysis
>>>>>>>> >> Sent: Thursday, November 2, 2023 05:42
>>>>>>>> >> To: R meta <r-sig-meta-analysis using r-project.org>
>>>>>>>> >> Cc: Yuhang Hu <yh342 using nau.edu>
>>>>>>>> >> Subject: [R-meta] Correcting Hedges' g vs. Log response ratio in
>>>>>>>> nested studies
>>>>>>>> >>
>>>>>>>> >> Hello All,
>>>>>>>> >>
>>>>>>>> >> I know that when correcting Hedges' g (i.e., bias-corrected SMD,
>>>>>>>> aka "g")
>>>>>>>> >> in nested studies, we have to **BOTH** adjust our initial "g"
>>>>>>>> and its
>>>>>>>> >> sampling variance "vi_g"
>>>>>>>> >> (
>>>>>>>> https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-41-Supplement-
>>>>>>>> >> 508_09212020.pdf).
>>>>>>>> >>
>>>>>>>> >> But when correcting Log Response Ratios (LRR) in nested studies,
>>>>>>>> we have to
>>>>>>>> >> **ONLY** adjust its initial sampling variance "vi_LRR" but not
>>>>>>>> "LRR" itself
>>>>>>>> >> (
>>>>>>>> https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2021-October/003486.html
>>>>>>>> ).
>>>>>>>> >>
>>>>>>>> >> I wonder why the two methods of correction differ for Hedge's g
>>>>>>>> and LRR?
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >> Yuhang
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > R-sig-meta-analysis mailing list @
>>>>>>>> R-sig-meta-analysis using r-project.org
>>>>>>>> > To manage your subscription to this mailing list, go to:
>>>>>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>>>>>
>>>>>>>

	[[alternative HTML version deleted]]



More information about the R-sig-meta-analysis mailing list