[R-meta] Correcting Hedges' g vs. Log response ratio in nested studies

Fri Nov 3 15:37:19 CET 2023

Thanks, James.

The part that I struggled with is "a study with m clusters". This makes me
think that I need to add a new column to my data called "class" to index
the rows belonging to each class in each study.

Do I now have it right?

dat <- read.table(header=TRUE,text="
study  gi   vi  cluster n1  n2  class
1     .2   .05  T       25  23   1
1     .3   .08  T       18  11   2
2     1    .1   F       19  21   1
2     2    .2   F       12  36   2")

g_cluster <- function(gi, n_bar, N, icc=.15){

gi*sqrt(1 - (2 * (n_bar - 1) * icc / (N - 2)))
}

dat %>%
  group_by(study) %>%
  mutate(m = n_distinct(class), N = sum(c(n1,n2)), n_bar=N/m,
         gi= ifelse(cluster, g_cluster(gi, n_bar, N), gi))

On Thu, Nov 2, 2023 at 8:07 PM James Pustejovsky <jepusto using gmail.com> wrote:

> No. The notation in the WWC handbook is poorly chosen and confusing, so
> let me offer a slightly different notation to try and clear things up. This
> correction applies to a study with m clusters, where cluster j has n_j
> individuals. There are a total of N = sum(n_1,...,n_m) individuals, of
> which N_t are assigned to the treatment condition and N_c are assigned to
> the control condition. The average cluster size is then n_bar = N / m. With
> this notation, the correction factor is
>
> sqrt(1 - (2 * (n_bar - 1) * icc / (N - 2))) = sqrt(1 - (2 * (n_bar - 1) *
> icc / (n_bar * m - 2)))
>
> Thus, for a given study, you need to know the icc, the value of n_bar, and
> the value of N (or the number of clusters m and the total sample size N,
> from which you can calculate n_bar) in order to calculate the correction
> factor.
>
> On Thu, Nov 2, 2023 at 7:32 PM Yuhang Hu <yh342 using nau.edu> wrote:
>
>> Thanks so  much, James. This is how I got it set up. Do I have it right?
>>
>> dat <- read.table(header=TRUE,text="
>> study  gi   vi  cluster n1  n2
>> 1     .2   .05  T       25  23
>> 1     .3   .08  T       18  11
>> 2     1    .1   F       19  21
>> 2     2    .2   F       12  36")
>>
>> g_cluster <- \(gi, n1, n2, icc=.15){
>>
>>   n <- mean(c(n1,n2), na.rm=TRUE)
>>   N <- sum(c(n1,n2), na.rm=TRUE)
>>   gi*sqrt( 1-((2*(n-1)*icc)/(N-2)) )
>> }
>> library(dplyr)
>>   group_by(dat , study) %>%
>>   mutate(gi= ifelse(cluster, g_cluster(gi,n1,n2),  gi))
>>
>> On Thu, Nov 2, 2023 at 3:22 PM James Pustejovsky <jepusto using gmail.com>
>> wrote:
>>
>>> This correction applies to a single effect size estimate from a given
>>> study. All of these values (n, N, n1, n2) are therefore specific to the
>>> study and need to be recorded for each row in a meta-analysis database.
>>>
>>> On Nov 2, 2023, at 5:14 PM, Yuhang Hu <yh342 using nau.edu> wrote:
>>>
>>> 
>>> Sure, I thought N is n1 + n2  which is unique to each row in the
>>> dataset.
>>>
>>> But it looks like I should compute N as n * m where "n" is (average of
>>> n1, n2) for each row in the data but "m" is constant across all rows in the
>>> dataset.
>>>
>>> Thanks,
>>> Yuhang
>>>
>>> On Thu, Nov 2, 2023 at 2:16 PM James Pustejovsky <jepusto using gmail.com>
>>> wrote:
>>>
>>>> Total sample size is the same thing as the average sample size per
>>>> cluster times the number of clusters. My previous message is just a
>>>> restatement of the formula to show how it is related to the number of
>>>> clusters.
>>>>
>>>> On Thu, Nov 2, 2023 at 4:11 PM Yuhang Hu <yh342 using nau.edu> wrote:
>>>>
>>>>> Hi James,
>>>>>
>>>>> If you look at Eq. number E.5.1 on p1 of this document: (
>>>>> https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-41-Supplement-508_09212020.pdf)
>>>>> they define the correction factor as: sqrt( 1-((2*(n-1)*icc)/(N-2)) )
>>>>> where N is n1 + n2 (total sample size), and n as s the average number
>>>>> of individuals per cluster.
>>>>>
>>>>> Am I missing something? Or is the correction factor linked above from
>>>>> WWC inaccurate?
>>>>>
>>>>> Thank you,
>>>>> Yuhang
>>>>>
>>>>> On Thu, Nov 2, 2023 at 1:51 PM James Pustejovsky <jepusto using gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Responses inline below.
>>>>>>
>>>>>> On Thu, Nov 2, 2023 at 3:30 PM Yuhang Hu <yh342 using nau.edu> wrote:
>>>>>>
>>>>>>> Regarding your first message, it looks like the correction factor
>>>>>>> for SMD is: sqrt( 1-((2*(n-1)*icc)/(N-2)) ) where n is the average cluster
>>>>>>> size for each comparison in a study, and N is the sum of the two groups'
>>>>>>> sample sizes. So, I wonder how the number of clusters is impacting the
>>>>>>> correction factor for SMD as you indicated?
>>>>>>>
>>>>>>> N = n * m, where m is the number of clusters. So the correction
>>>>>> factor is
>>>>>> sqrt( 1-((2*(n-1)*icc)/(m * n - 2)) ~=  sqrt( 1- 2 * icc /m)
>>>>>>
>>>>>>
>>>>>>> Regarding my initial question, my hunch was that for SMD, the SMD
>>>>>>> estimate and its sampling variance are (non-linearly) related to one
>>>>>>> another. Therefore, correcting the sampling variance for a design issue
>>>>>>> will necessitate correcting the SDM estimate as well.
>>>>>>>
>>>>>>> On the other hand, the LRR estimate and its sampling variance are
>>>>>>> not as much related to one another. Therefore, correcting the sampling
>>>>>>> variance for a design issue will not necessitate correcting the LRR
>>>>>>> estimate as well.
>>>>>>>
>>>>>>>
>>>>>> No, the issue you've described here is pretty much unrelated to the
>>>>>> bias correction problem.
>>>>>>
>>>>>>
>>>>>>> On Thu, Nov 2, 2023 at 8:41 AM James Pustejovsky <jepusto using gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> One other thought on this question, for the extra-nerdy.
>>>>>>>>
>>>>>>>> The formulas for the Hedges' g SMD estimator involve what
>>>>>>>> statisticians would call "second-order" bias corrections, meaning
>>>>>>>> corrections arising from having a limited sample size. In contrast, the
>>>>>>>> usual estimator of the LRR is just a "plug-in" estimator that works for
>>>>>>>> large sample sizes but can have small biases with limited sample sizes.
>>>>>>>> Lajeunesse (2015; https://doi.org/10.1890/14-2402.1) provides
>>>>>>>> formulas for the second-order bias correction of the LRR estimator with
>>>>>>>> independent samples. These bias correction formulas actually *would* need
>>>>>>>> to be different if you have clustered observations. So, the two effect size
>>>>>>>> metrics are maybe more similar than it initially seemed:
>>>>>>>> - Both metrics have plug-in estimators that are not really affected
>>>>>>>> by the dependence structure of the sample, but whose variance estimators do
>>>>>>>> need to take into account the dependence structure
>>>>>>>> - Both metrics have second-order corrected estimators, the exact
>>>>>>>> form for which does need to take into account the dependence structure.
>>>>>>>>
>>>>>>>> James
>>>>>>>>
>>>>>>>> On Thu, Nov 2, 2023 at 8:14 AM James Pustejovsky <jepusto using gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Wolfgang is correct. The WWC correction factor arises because the
>>>>>>>>> sample variance is not quite unbiased as an estimator for the total
>>>>>>>>> population variance in a design with clusters of dependent observations,
>>>>>>>>> which leads to a small bias in the SMD.
>>>>>>>>>
>>>>>>>>> The thing is, though, this correction factor is usually
>>>>>>>>> negligible. Say you’ve got a clustered design with n = 21 kids per cluster
>>>>>>>>> and 20 clusters, and an ICC of 0.2. Then the correction factor is going to
>>>>>>>>> be about 0.99 and so will make very little difference for the effect size
>>>>>>>>> estimate. It only starts to matter if you’re looking at studies with very
>>>>>>>>> few clusters and non-trivial ICCs.
>>>>>>>>>
>>>>>>>>> James
>>>>>>>>>
>>>>>>>>> > On Nov 2, 2023, at 3:04 AM, Viechtbauer, Wolfgang (NP) via
>>>>>>>>> R-sig-meta-analysis <r-sig-meta-analysis using r-project.org> wrote:
>>>>>>>>> > Dear Yuhang,
>>>>>>>>> >
>>>>>>>>> > I haven't looked deeply into this, but an immediate thought I
>>>>>>>>> have is that for SMDs, you divide by some measure of variability within the
>>>>>>>>> groups. If that measure of variability is affected by your study design,
>>>>>>>>> then this will also affect the SMD value. On the other hand, this doesn't
>>>>>>>>> have any impact on LRRs since they are only the (log-transformed) ratio of
>>>>>>>>> the means.
>>>>>>>>> >
>>>>>>>>> > Best,
>>>>>>>>> > Wolfgang
>>>>>>>>> >
>>>>>>>>> >> -----Original Message-----
>>>>>>>>> >> From: R-sig-meta-analysis <
>>>>>>>>> r-sig-meta-analysis-bounces using r-project.org> On Behalf
>>>>>>>>> >> Of Yuhang Hu via R-sig-meta-analysis
>>>>>>>>> >> Sent: Thursday, November 2, 2023 05:42
>>>>>>>>> >> To: R meta <r-sig-meta-analysis using r-project.org>
>>>>>>>>> >> Cc: Yuhang Hu <yh342 using nau.edu>
>>>>>>>>> >> Subject: [R-meta] Correcting Hedges' g vs. Log response ratio
>>>>>>>>> in nested studies
>>>>>>>>> >>
>>>>>>>>> >> Hello All,
>>>>>>>>> >>
>>>>>>>>> >> I know that when correcting Hedges' g (i.e., bias-corrected
>>>>>>>>> SMD, aka "g")
>>>>>>>>> >> in nested studies, we have to **BOTH** adjust our initial "g"
>>>>>>>>> and its
>>>>>>>>> >> sampling variance "vi_g"
>>>>>>>>> >> (
>>>>>>>>> https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-41-Supplement-
>>>>>>>>> >> 508_09212020.pdf).
>>>>>>>>> >>
>>>>>>>>> >> But when correcting Log Response Ratios (LRR) in nested
>>>>>>>>> studies, we have to
>>>>>>>>> >> **ONLY** adjust its initial sampling variance "vi_LRR" but not
>>>>>>>>> "LRR" itself
>>>>>>>>> >> (
>>>>>>>>> https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2021-October/003486.html
>>>>>>>>> ).
>>>>>>>>> >>
>>>>>>>>> >> I wonder why the two methods of correction differ for Hedge's g
>>>>>>>>> and LRR?
>>>>>>>>> >>
>>>>>>>>> >> Thanks,
>>>>>>>>> >> Yuhang
>>>>>>>>> >
>>>>>>>>> > _______________________________________________
>>>>>>>>> > R-sig-meta-analysis mailing list @
>>>>>>>>> R-sig-meta-analysis using r-project.org
>>>>>>>>> > To manage your subscription to this mailing list, go to:
>>>>>>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>>>>>>
>>>>>>>>

	[[alternative HTML version deleted]]