[R-meta] Clarification on an answer

Tue Feb 14 04:50:33 CET 2023

Hi James,

Thank you for your reply. For clarity, you mean:

( publications = (67  +  33)  / 96 )

( studies = (93 + 54) / 138 )

Thank you,
Yuhang

Full Model:
>             estim    sqrt  nlvls  fixed        factor
> sigma^2.1  0.0346  0.1861     96     no            id
> sigma^2.2  0.0000  0.0000    138     no      id/study
> sigma^2.3  0.0356  0.1887    302     no  id/study/esid

Subgroup1:
>                    estim    sqrt        nlvls  fixed        factor
> sigma^2.1  0.0401  0.2002     67     no            id
> sigma^2.2  0.0000  0.0000     93     no      id/study
> sigma^2.3  0.0459  0.2142    215     no  id/study/esid

Subgroup2:
>             estim    sqrt  nlvls  fixed        factor
> sigma^2.1  0.0294  0.1713     33     no            id
> sigma^2.2  0.0000  0.0000     54     no      id/study
> sigma^2.3  0.0116  0.1077     87     no  id/study/esid

On Mon, Feb 13, 2023 at 7:50 PM James Pustejovsky <jepusto using gmail.com> wrote:

> I compared the sum of the number of studies (or number of publications)
> across levels of the moderator to the total number of studies (or
> publications). The difference is the number of studies (or publications)
> that include both levels.
>
> On Mon, Feb 13, 2023 at 8:16 PM Yuhang Hu <yh342 using nau.edu> wrote:
>
>> Thank you, James, for the clarification. Regarding my second question, I
>> was particularly interested in understanding how to determine the number of
>> publications and their nested number of studies that have both levels from
>> the output (likely from "nlvls" below), not the raw data?
>>
>> Thank you,
>> Yuhang
>>
>> Full Model:
>> >             estim    sqrt  nlvls  fixed        factor
>> > sigma^2.1  0.0346  0.1861     96     no            id
>> > sigma^2.2  0.0000  0.0000    138     no      id/study
>> > sigma^2.3  0.0356  0.1887    302     no  id/study/esid
>>
>> Subgroup1:
>> >                    estim    sqrt        nlvls  fixed        factor
>> > sigma^2.1  0.0401  0.2002     67     no            id
>> > sigma^2.2  0.0000  0.0000     93     no      id/study
>> > sigma^2.3  0.0459  0.2142    215     no  id/study/esid
>>
>> Subgroup2:
>> >             estim    sqrt  nlvls  fixed        factor
>> > sigma^2.1  0.0294  0.1713     33     no            id
>> > sigma^2.2  0.0000  0.0000     54     no      id/study
>> > sigma^2.3  0.0116  0.1077     87     no  id/study/esid
>>
>> On Mon, Feb 13, 2023 at 3:43 PM James Pustejovsky <jepusto using gmail.com>
>> wrote:
>>
>>> Hi Yuhang,
>>>
>>> Thanks for your questions.
>>>
>>> On your first question, yes you are correct it should be DIAG DIAG
>>> (diagonal vcov matrices) rather than ID ID (identity matrices).
>>>
>>> On your second question, I think the easiest approach might be to just
>>> count. For a given level of units (studies or samples), summarize of the
>>> number of effects included in each unit by each value of the moderator.
>>> Then count the number of units that have both values of the moderator (or
>>> more generally, have multiple values of the moderator).
>>>
>>> Section 4.1 of the supplementary materials for Pustejovsky & Tipton
>>> (2022) has some examples of summary tables that get at this question.
>>> Supplementary materials pdf: https://osf.io/nyv4u
>>> Code for supplementary materials: https://osf.io/mahc2
>>>
>>> James
>>>
>>>
>>> On Mon, Feb 13, 2023 at 3:49 PM Yuhang Hu via R-sig-meta-analysis <
>>> r-sig-meta-analysis using r-project.org> wrote:
>>>
>>>> Dear Meta-analysis Experts,
>>>>
>>>> I'm hoping to use the forwarded answer below in my model but need two
>>>> clarifications about the answer.
>>>>
>>>> First, bullet point # 3, says that we fit the below model and then
>>>> "compare
>>>> this model against the model that assumes homogeneous variance
>>>> components
>>>> across subgroups."
>>>>
>>>> rma.mv(yi, V = Vsub,
>>>>               mods = ~ tc_fac - 1,
>>>>               random = list(~ tc_fac | id, ~ tc_fac | id:esid), struct =
>>>> c("ID","ID"),
>>>>               data = cc_d, method = "REML")
>>>>
>>>> But the above model already assumes homogeneous variance components
>>>> across
>>>> subgroups, so I wonder if the intention was actually to set struct =
>>>> c("DIAG","DIAG")?
>>>>
>>>> Second, the answer mentions "Based on the reported counts from your
>>>> output,
>>>> it looks like there might be four publications (reporting a total of
>>>> nine
>>>> studies) that have both levels."
>>>>
>>>> I was wondering how to obtain the number of publications and their
>>>> nested
>>>> number of studies that have both levels (so I can apply this to my
>>>> model)?
>>>>
>>>> Thank you for your time,
>>>> Yuhang
>>>> ---------- Forwarded message ---------
>>>> From: James Pustejovsky via R-sig-meta-analysis
>>>> <r-sig-meta-analysis using r-project.org>
>>>> Date: Thu, Feb 9, 2023 at 8:14 PM
>>>> Subject: Re: [R-meta] Different estimates for multilevel meta-analysis
>>>> when using rma.mv with mods or subset
>>>> To: R Special Interest Group for Meta-Analysis
>>>> <r-sig-meta-analysis using r-project.org>
>>>>
>>>> Hi Janis,
>>>>
>>>> There are potentially two things going on here. First, when estimating
>>>> separate models within each subgroup, you allow the heterogeneity
>>>> (variance
>>>> components) to differ by subgroup. This in turn leads to a different
>>>> weighting of the individual effect size estimates in the calculation of
>>>> the
>>>> overall average effect sizes than the weighting used in the moderator
>>>> analysis. That difference in weighting might be enough to account for
>>>> the
>>>> swings in the average ES for each category. To determine which model is
>>>> more appropriate, you could use fit statistics or a likelihood ratio
>>>> test.
>>>> To get such information, you'll need to find a way to express the
>>>> subgroup
>>>> analyses in terms of a single model. You can do this as follows:
>>>>
>>>> 1. Make a factor variable from targetcommitment so that you don't have
>>>> to
>>>> repeatedly calculate it:
>>>> cc_d$tc_fac <- factor(cc_d$targetcommitment)
>>>>
>>>> 2. If you created the V matrix using metafor::vcalc(), set the subgroup
>>>> argument to the moderator variable:
>>>> Vsub <- vcalc(..., subgroup = cc_d$tc_fac, ...)
>>>>
>>>> 3. Fit a model using this new V matrix, specifying a random effects
>>>> structure that allows independent effects in each subgroup:
>>>> rma.mv(yi, V = Vsub,
>>>>               mods = ~ tc_fac - 1,
>>>>               random = list(~ tc_fac | id, ~ tc_fac | id:esid), struct =
>>>> c("ID","ID"),
>>>>               data = cc_d, method = "REML")
>>>> Note that I've omitted the middle level of random effects because there
>>>> doesn't seem to be any variance there after accounting for the id and
>>>> esid
>>>> levels. The results of (3) should be identical (or nearly so) to the
>>>> results from the subgroup analysis. But now they're embedded within one
>>>> model, so you can get fit statistics or conduct a likelihood ratio test
>>>> to
>>>> compare this model against the model that assumes homogeneous variance
>>>> components across subgroups.
>>>>
>>>> The second thing that might be going on is that your data seems to
>>>> include
>>>> a few publications that have both levels of the targetcommitment
>>>> variable.
>>>> Based on the reported counts from your output, it looks like there
>>>> might be
>>>> four publications (reporting a total of nine studies) that have both
>>>> levels. Because of this structure, the first model you use for moderator
>>>> analysis will calculated average effect size estimates for each level of
>>>> targetcommitment based in part on the effect size estimates *for the
>>>> other
>>>> category* from the four publications / 9 studies that include both
>>>> levels.
>>>> This has the effect of moving the averages towards each other---that is
>>>> -.175 and .241 get pulled inward to  -.155 and .188.
>>>>
>>>> If you're particularly interested in figuring out the difference between
>>>> levels of targetcommitment, the question is then whether this "pulling
>>>> in"
>>>> is a reasonable thing to do. You could consider this on a conceptual
>>>> level:
>>>> is it reasonable to put special emphasis on the results from these four
>>>> publications in estimating the difference between levels of
>>>> targetcommitment? If there's not a clear conceptual rationale, then you
>>>> might consider fitting a slightly different model, which includes the
>>>> study-mean variable and the study-mean-centered variable as separate
>>>> terms.
>>>> For targetcommitment, this would mean calculating the average of the
>>>> dummy
>>>> variable (targetcommitment == 2) for each study (this gives you the
>>>> first
>>>> predictor, call it tc_study) and then subtracting the average from the
>>>> original dummy variable (this gives you the second predictor, call it
>>>> tc_within). The model should then use the moderators:
>>>> ~ 1 + tc_study + tc_within
>>>> Tanner-Smith and Tipton (2014; https://doi.org/10.1002/jrsm.1091)
>>>> recommend
>>>> this as a generic strategy for meta-regression of dependent effect
>>>> sizes.
>>>>
>>>> James
>>>>
>>>>
>>>> On Wed, Feb 8, 2023 at 2:40 PM Janis Zickfeld via R-sig-meta-analysis <
>>>> r-sig-meta-analysis using r-project.org> wrote:
>>>>
>>>> > Hi all,
>>>> >
>>>> > my question has already been raised before (e.g.,
>>>> >
>>>> https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2018-April/000774.html
>>>> > ),
>>>> > but the answers did not help me understanding the main problem or
>>>> > solve differences in the output, so that is why I'm posting it again.
>>>> >
>>>> > We are conducting a meta-analysis and have effect sizes ("esid")
>>>> > nested within studies or samples ("study") nested within publications
>>>> > ("id"). There are 344 effect sizes in total across 110 publications
>>>> > and 160 study - publication combinations. There are 48.75% of studies
>>>> > including more than one effect size and of these 26.82% are dependent
>>>> > effects because they either use the same sample or the same control
>>>> > treatment used to calculate the effect. Therefore, we have constructed
>>>> > an approximate variance-covariance matrix to account for this. Some of
>>>> > the moderators only use a subset of this data, as they have missing
>>>> > values (as in the example below).
>>>> >
>>>> > My main problem is now that I get different estimates when running the
>>>> > rma.mv model with factorial moderators using 'mods' or when
>>>> subsetting
>>>> > them via the 'subset' command. I have seen this discussed here
>>>> > (
>>>> >
>>>>
>>>> http://www.metafor-project.org/doku.php/tips:comp_two_independent_estimates
>>>> > )
>>>> > and it seems that the main difference is that 'mods' uses the same
>>>> > residual heterogeneity, whereas 'subset' allows for different levels
>>>> > of tau^2. However, this example does not discuss a multilevel
>>>> > meta-analysis.
>>>> >
>>>> > When running 'mods' using the following syntax:
>>>> >
>>>> > rma.mv(yi, V, random = ~ 1 | id/study/esid, data = cc_d, method =
>>>> > "REML", mods = ~factor(target_commitment) - 1)
>>>> >
>>>> > I get:
>>>> >
>>>> > Multivariate Meta-Analysis Model (k = 302; method: REML)
>>>> >
>>>> > Variance Components:
>>>> >
>>>> >             estim    sqrt  nlvls  fixed        factor
>>>> > sigma^2.1  0.0346  0.1861     96     no            id
>>>> > sigma^2.2  0.0000  0.0000    138     no      id/study
>>>> > sigma^2.3  0.0356  0.1887    302     no  id/study/esid
>>>> >
>>>> > Test for Residual Heterogeneity:
>>>> > QE(df = 300) = 2213.7041, p-val < .0001
>>>> >
>>>> > Test of Moderators (coefficients 1:2):
>>>> > QM(df = 2) = 54.1962, p-val < .0001
>>>> >
>>>> > Model Results:
>>>> >
>>>> >                                              estimate         se
>>>> > zval        pval        ci.lb    ci.ub
>>>> > factor(target_commitment)1   -0.1551  0.0315  -4.9258  <.0001  -0.2169
>>>> >  -0.0934  ***
>>>> > factor(target_commitment)5    0.1878  0.0420   4.4770  <.0001   0.1056
>>>> >   0.2701  ***
>>>> >
>>>> >
>>>> > However, running:
>>>> >
>>>> > rma.mv(yi, V, random = ~ 1 | id/study/ID2, data = cc_d, method =
>>>> > "REML", subset=target_commitment==1)
>>>> >
>>>> > I obtain:
>>>> >
>>>> > Multivariate Meta-Analysis Model (k = 215; method: REML)
>>>> >
>>>> > Variance Components:
>>>> >
>>>> >                    estim    sqrt        nlvls  fixed        factor
>>>> > sigma^2.1  0.0401  0.2002     67     no            id
>>>> > sigma^2.2  0.0000  0.0000     93     no      id/study
>>>> > sigma^2.3  0.0459  0.2142    215     no  id/study/esid
>>>> >
>>>> > Test for Heterogeneity:
>>>> > Q(df = 214) = 1876.2588, p-val < .0001
>>>> >
>>>> > Model Results:
>>>> >
>>>> >                                 estimate      se        tval
>>>> > df      pval       ci.lb    ci.ub
>>>> > targetcommitment1 -0.1747  0.0352  -4.9607  214  <.0001  -0.2441
>>>> -0.1053
>>>> > ***
>>>> >
>>>> >
>>>> > Multivariate Meta-Analysis Model (k = 87; method: REML)
>>>> >
>>>> > Variance Components:
>>>> >
>>>> >             estim    sqrt  nlvls  fixed        factor
>>>> > sigma^2.1  0.0294  0.1713     33     no            id
>>>> > sigma^2.2  0.0000  0.0000     54     no      id/study
>>>> > sigma^2.3  0.0116  0.1077     87     no  id/study/esid
>>>> >
>>>> > Test for Heterogeneity:
>>>> > Q(df = 86) = 302.2812, p-val < .0001
>>>> >
>>>> > Model Results:
>>>> >
>>>> >                                  estimate      se    tval  df    pval
>>>> >  ci.lb   ci.ub
>>>> >   targetcommitment5 0.2407  0.0397  6.0606  86  <.0001  0.1618  0.3197
>>>> ***
>>>> >
>>>> >
>>>> > So there is a difference between -.155 and -.175, as well as .188 and
>>>> > .241. These are not the biggest differences, but I have other
>>>> > moderators for which the differences are stronger.
>>>> >
>>>> > I tried to follow the suggestions of one of the previous questions
>>>> > (
>>>> https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2018-April/000774.html
>>>> > ),
>>>> > but this didn't reduce the difference for me.
>>>> >
>>>> > My main question would be which of the two approaches would be the
>>>> > preferable one (and if there is any empirical basis for this)? Is it
>>>> > more a choice of preference here or is one approach superior in the
>>>> > present case?
>>>> >
>>>> > I could also post a link to the data if that would be helpful for
>>>> > reproducing the findings.
>>>> >
>>>> > I apologize for double posting this issue and hope that maybe someone
>>>> > can clarify which option I should choose.
>>>> >
>>>> > Best wishes,
>>>> > Janis
>>>> >
>>>> > _______________________________________________
>>>> > R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org
>>>> > To manage your subscription to this mailing list, go to:
>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org
>>>> To manage your subscription to this mailing list, go to:
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>
>>>
>>
>> --
>> Yuhang Hu (She/Her/Hers)
>> Ph.D. Student in Applied Linguistics
>> Department of English
>> Northern Arizona University
>>
>

-- 
Yuhang Hu (She/Her/Hers)
Ph.D. Student in Applied Linguistics
Department of English
Northern Arizona University

	[[alternative HTML version deleted]]