[R-meta] Correcting Hedges' g vs. Log response ratio in nested studies

Yuhang Hu yh342 @end|ng |rom n@u@edu
Sat Nov 4 03:42:52 CET 2023


Correction: I think "m" has to be the same for all rows from a clustered
study.

Is the following correct?

dat <- read.table(header=TRUE,text="
study  gi   vi  cluster n1  n2   m
1     .2   .05  T       25  23   2
1     .3   .08  T       18  11   2
2     1    .1   F       19  21   1
2     2    .2   F       12  36   1")

g_cluster <- \(gi, n_bar, N, icc=.15) { gi*sqrt(1 - (2 * (n_bar - 1) * icc
/ (N - 2))) }

dat %>%
  group_by(study) %>%
  mutate(N = sum(c(n1,n2)), n_bar=N/m,
         gi= ifelse(cluster, g_cluster(gi, n_bar, N), gi))

On Fri, Nov 3, 2023 at 7:37 AM Yuhang Hu <yh342 using nau.edu> wrote:

> Thanks, James.
>
> The part that I struggled with is "a study with m clusters". This makes
> me think that I need to add a new column to my data called "class" to index
> the rows belonging to each class in each study.
>
> Do I now have it right?
>
> dat <- read.table(header=TRUE,text="
> study  gi   vi  cluster n1  n2  class
> 1     .2   .05  T       25  23   1
> 1     .3   .08  T       18  11   2
> 2     1    .1   F       19  21   1
> 2     2    .2   F       12  36   2")
>
> g_cluster <- function(gi, n_bar, N, icc=.15){
>
> gi*sqrt(1 - (2 * (n_bar - 1) * icc / (N - 2)))
> }
>
> dat %>%
>   group_by(study) %>%
>   mutate(m = n_distinct(class), N = sum(c(n1,n2)), n_bar=N/m,
>          gi= ifelse(cluster, g_cluster(gi, n_bar, N), gi))
>
> On Thu, Nov 2, 2023 at 8:07 PM James Pustejovsky <jepusto using gmail.com>
> wrote:
>
>> No. The notation in the WWC handbook is poorly chosen and confusing, so
>> let me offer a slightly different notation to try and clear things up. This
>> correction applies to a study with m clusters, where cluster j has n_j
>> individuals. There are a total of N = sum(n_1,...,n_m) individuals, of
>> which N_t are assigned to the treatment condition and N_c are assigned to
>> the control condition. The average cluster size is then n_bar = N / m. With
>> this notation, the correction factor is
>>
>> sqrt(1 - (2 * (n_bar - 1) * icc / (N - 2))) = sqrt(1 - (2 * (n_bar - 1) *
>> icc / (n_bar * m - 2)))
>>
>> Thus, for a given study, you need to know the icc, the value of n_bar,
>> and the value of N (or the number of clusters m and the total sample size
>> N, from which you can calculate n_bar) in order to calculate the correction
>> factor.
>>
>> On Thu, Nov 2, 2023 at 7:32 PM Yuhang Hu <yh342 using nau.edu> wrote:
>>
>>> Thanks so  much, James. This is how I got it set up. Do I have it right?
>>>
>>> dat <- read.table(header=TRUE,text="
>>> study  gi   vi  cluster n1  n2
>>> 1     .2   .05  T       25  23
>>> 1     .3   .08  T       18  11
>>> 2     1    .1   F       19  21
>>> 2     2    .2   F       12  36")
>>>
>>> g_cluster <- \(gi, n1, n2, icc=.15){
>>>
>>>   n <- mean(c(n1,n2), na.rm=TRUE)
>>>   N <- sum(c(n1,n2), na.rm=TRUE)
>>>   gi*sqrt( 1-((2*(n-1)*icc)/(N-2)) )
>>> }
>>> library(dplyr)
>>>   group_by(dat , study) %>%
>>>   mutate(gi= ifelse(cluster, g_cluster(gi,n1,n2),  gi))
>>>
>>> On Thu, Nov 2, 2023 at 3:22 PM James Pustejovsky <jepusto using gmail.com>
>>> wrote:
>>>
>>>> This correction applies to a single effect size estimate from a given
>>>> study. All of these values (n, N, n1, n2) are therefore specific to the
>>>> study and need to be recorded for each row in a meta-analysis database.
>>>>
>>>> On Nov 2, 2023, at 5:14 PM, Yuhang Hu <yh342 using nau.edu> wrote:
>>>>
>>>> 
>>>> Sure, I thought N is n1 + n2  which is unique to each row in the
>>>> dataset.
>>>>
>>>> But it looks like I should compute N as n * m where "n" is (average of
>>>> n1, n2) for each row in the data but "m" is constant across all rows in the
>>>> dataset.
>>>>
>>>> Thanks,
>>>> Yuhang
>>>>
>>>> On Thu, Nov 2, 2023 at 2:16 PM James Pustejovsky <jepusto using gmail.com>
>>>> wrote:
>>>>
>>>>> Total sample size is the same thing as the average sample size per
>>>>> cluster times the number of clusters. My previous message is just a
>>>>> restatement of the formula to show how it is related to the number of
>>>>> clusters.
>>>>>
>>>>> On Thu, Nov 2, 2023 at 4:11 PM Yuhang Hu <yh342 using nau.edu> wrote:
>>>>>
>>>>>> Hi James,
>>>>>>
>>>>>> If you look at Eq. number E.5.1 on p1 of this document: (
>>>>>> https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-41-Supplement-508_09212020.pdf)
>>>>>> they define the correction factor as: sqrt( 1-((2*(n-1)*icc)/(N-2)) )
>>>>>>
>>>>>> where N is n1 + n2 (total sample size), and n as s the average
>>>>>> number of individuals per cluster.
>>>>>>
>>>>>> Am I missing something? Or is the correction factor linked above from
>>>>>> WWC inaccurate?
>>>>>>
>>>>>> Thank you,
>>>>>> Yuhang
>>>>>>
>>>>>> On Thu, Nov 2, 2023 at 1:51 PM James Pustejovsky <jepusto using gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Responses inline below.
>>>>>>>
>>>>>>> On Thu, Nov 2, 2023 at 3:30 PM Yuhang Hu <yh342 using nau.edu> wrote:
>>>>>>>
>>>>>>>> Regarding your first message, it looks like the correction factor
>>>>>>>> for SMD is: sqrt( 1-((2*(n-1)*icc)/(N-2)) ) where n is the average cluster
>>>>>>>> size for each comparison in a study, and N is the sum of the two groups'
>>>>>>>> sample sizes. So, I wonder how the number of clusters is impacting the
>>>>>>>> correction factor for SMD as you indicated?
>>>>>>>>
>>>>>>>> N = n * m, where m is the number of clusters. So the correction
>>>>>>> factor is
>>>>>>> sqrt( 1-((2*(n-1)*icc)/(m * n - 2)) ~=  sqrt( 1- 2 * icc /m)
>>>>>>>
>>>>>>>
>>>>>>>> Regarding my initial question, my hunch was that for SMD, the SMD
>>>>>>>> estimate and its sampling variance are (non-linearly) related to one
>>>>>>>> another. Therefore, correcting the sampling variance for a design issue
>>>>>>>> will necessitate correcting the SDM estimate as well.
>>>>>>>>
>>>>>>>> On the other hand, the LRR estimate and its sampling variance are
>>>>>>>> not as much related to one another. Therefore, correcting the sampling
>>>>>>>> variance for a design issue will not necessitate correcting the LRR
>>>>>>>> estimate as well.
>>>>>>>>
>>>>>>>>
>>>>>>> No, the issue you've described here is pretty much unrelated to the
>>>>>>> bias correction problem.
>>>>>>>
>>>>>>>
>>>>>>>> On Thu, Nov 2, 2023 at 8:41 AM James Pustejovsky <jepusto using gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> One other thought on this question, for the extra-nerdy.
>>>>>>>>>
>>>>>>>>> The formulas for the Hedges' g SMD estimator involve what
>>>>>>>>> statisticians would call "second-order" bias corrections, meaning
>>>>>>>>> corrections arising from having a limited sample size. In contrast, the
>>>>>>>>> usual estimator of the LRR is just a "plug-in" estimator that works for
>>>>>>>>> large sample sizes but can have small biases with limited sample sizes.
>>>>>>>>> Lajeunesse (2015; https://doi.org/10.1890/14-2402.1) provides
>>>>>>>>> formulas for the second-order bias correction of the LRR estimator with
>>>>>>>>> independent samples. These bias correction formulas actually *would* need
>>>>>>>>> to be different if you have clustered observations. So, the two effect size
>>>>>>>>> metrics are maybe more similar than it initially seemed:
>>>>>>>>> - Both metrics have plug-in estimators that are not really
>>>>>>>>> affected by the dependence structure of the sample, but whose variance
>>>>>>>>> estimators do need to take into account the dependence structure
>>>>>>>>> - Both metrics have second-order corrected estimators, the exact
>>>>>>>>> form for which does need to take into account the dependence structure.
>>>>>>>>>
>>>>>>>>> James
>>>>>>>>>
>>>>>>>>> On Thu, Nov 2, 2023 at 8:14 AM James Pustejovsky <
>>>>>>>>> jepusto using gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Wolfgang is correct. The WWC correction factor arises because the
>>>>>>>>>> sample variance is not quite unbiased as an estimator for the total
>>>>>>>>>> population variance in a design with clusters of dependent observations,
>>>>>>>>>> which leads to a small bias in the SMD.
>>>>>>>>>>
>>>>>>>>>> The thing is, though, this correction factor is usually
>>>>>>>>>> negligible. Say you’ve got a clustered design with n = 21 kids per cluster
>>>>>>>>>> and 20 clusters, and an ICC of 0.2. Then the correction factor is going to
>>>>>>>>>> be about 0.99 and so will make very little difference for the effect size
>>>>>>>>>> estimate. It only starts to matter if you’re looking at studies with very
>>>>>>>>>> few clusters and non-trivial ICCs.
>>>>>>>>>>
>>>>>>>>>> James
>>>>>>>>>>
>>>>>>>>>> > On Nov 2, 2023, at 3:04 AM, Viechtbauer, Wolfgang (NP) via
>>>>>>>>>> R-sig-meta-analysis <r-sig-meta-analysis using r-project.org> wrote:
>>>>>>>>>> > Dear Yuhang,
>>>>>>>>>> >
>>>>>>>>>> > I haven't looked deeply into this, but an immediate thought I
>>>>>>>>>> have is that for SMDs, you divide by some measure of variability within the
>>>>>>>>>> groups. If that measure of variability is affected by your study design,
>>>>>>>>>> then this will also affect the SMD value. On the other hand, this doesn't
>>>>>>>>>> have any impact on LRRs since they are only the (log-transformed) ratio of
>>>>>>>>>> the means.
>>>>>>>>>> >
>>>>>>>>>> > Best,
>>>>>>>>>> > Wolfgang
>>>>>>>>>> >
>>>>>>>>>> >> -----Original Message-----
>>>>>>>>>> >> From: R-sig-meta-analysis <
>>>>>>>>>> r-sig-meta-analysis-bounces using r-project.org> On Behalf
>>>>>>>>>> >> Of Yuhang Hu via R-sig-meta-analysis
>>>>>>>>>> >> Sent: Thursday, November 2, 2023 05:42
>>>>>>>>>> >> To: R meta <r-sig-meta-analysis using r-project.org>
>>>>>>>>>> >> Cc: Yuhang Hu <yh342 using nau.edu>
>>>>>>>>>> >> Subject: [R-meta] Correcting Hedges' g vs. Log response ratio
>>>>>>>>>> in nested studies
>>>>>>>>>> >>
>>>>>>>>>> >> Hello All,
>>>>>>>>>> >>
>>>>>>>>>> >> I know that when correcting Hedges' g (i.e., bias-corrected
>>>>>>>>>> SMD, aka "g")
>>>>>>>>>> >> in nested studies, we have to **BOTH** adjust our initial "g"
>>>>>>>>>> and its
>>>>>>>>>> >> sampling variance "vi_g"
>>>>>>>>>> >> (
>>>>>>>>>> https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-41-Supplement-
>>>>>>>>>> >> 508_09212020.pdf).
>>>>>>>>>> >>
>>>>>>>>>> >> But when correcting Log Response Ratios (LRR) in nested
>>>>>>>>>> studies, we have to
>>>>>>>>>> >> **ONLY** adjust its initial sampling variance "vi_LRR" but not
>>>>>>>>>> "LRR" itself
>>>>>>>>>> >> (
>>>>>>>>>> https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2021-October/003486.html
>>>>>>>>>> ).
>>>>>>>>>> >>
>>>>>>>>>> >> I wonder why the two methods of correction differ for Hedge's
>>>>>>>>>> g and LRR?
>>>>>>>>>> >>
>>>>>>>>>> >> Thanks,
>>>>>>>>>> >> Yuhang
>>>>>>>>>> >
>>>>>>>>>> > _______________________________________________
>>>>>>>>>> > R-sig-meta-analysis mailing list @
>>>>>>>>>> R-sig-meta-analysis using r-project.org
>>>>>>>>>> > To manage your subscription to this mailing list, go to:
>>>>>>>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>>>>>>>
>>>>>>>>>

	[[alternative HTML version deleted]]



More information about the R-sig-meta-analysis mailing list