[R-meta] "Categorical" moderator varying within and between studies

Thu Oct 29 07:32:13 CET 2020

Dear James,

I'm returning to this after a while, a quick question. In your gender
example, you used the term "%female" in your interpretation of the
contextual effect. If the categorical predictor had more than 2 categories,
then would you still use the term % in your interpretation?

My understanding of contextual effect is below:

Contextual effect is the average difference in the DV between two units
(e.g., subjects) which have the same value on an IV (e.g., same gender),
but belong to clusters (e.g., schools) whose mean/percentage on that IV
differs by one unit  (is unit percentage if IV is categorical?).

Thank you, Simon

On Sun, Jun 7, 2020 at 7:30 AM James Pustejovsky <jepusto using gmail.com> wrote:

> Yes, it’s general and also applies outside the context of meta-analysis.
> See for example Raudenbush & Bryk (2002) for a good discussion on centering
> and contextual effects in hierarchical linear models.
>
> On Jun 6, 2020, at 11:07 PM, Simon Harmel <sim.harmel using gmail.com> wrote:
>
> Many thanks James. A quick follow-up. The strategy that you described is a
> general, regression modeling strategy, right? I mean even if we were
> fitting a multi-level model, the fixed-effects part of the formula had to
> include the same construction of (i.e., *b1 (% female-within)_ij + b2 (%
> female-between)_j*) in it?
>
> Thanks,
> Simon
>
> On Thu, Jun 4, 2020 at 9:42 AM James Pustejovsky <jepusto using gmail.com>
> wrote:
>
>> Hi Simon,
>>
>> Please keep the listserv cc'd so that others can benefit from these
>> discussions.
>>
>> Unfortunately, I don't think there is any single answer to your
>> question---analytic strategies just depend too much on what your research
>> questions are and the substantive context that you're working in.
>>
>> But speaking generally, the advantages of splitting predictors into
>> within- and between-study versions are two-fold. First is that doing this
>> provides an understanding of the structure of the data you're working with,
>> in that it forces one to consider *which* predictors have within-study
>> variation and *how much *variation there is (e.g., perhaps many studies
>> have looked at internalizing symptoms, many studies have looked at
>> externalizing symptoms, but only a few have looked at both types of
>> outcomes in the same sample). The second advantage is that within-study
>> predictors have a distinct interpretation from between-study predictors,
>> and the within-study version is often theoretically more
>> interesting/salient. That's because comparisons of effect sizes based on
>> within-study variation hold constant other aspects of the studies that
>> could influence effect size (and that could muddy the interpretation of the
>> moderator).
>>
>> Here is an example that comes up often in research synthesis projects.
>> Suppose that you're interested in whether participant sex moderates the
>> effect of some intervention. Most of the studies in the sample are of type
>> A, such that only aggregated effect sizes can be calculated. For these type
>> A studies, we are able to determine a) the average effect size across the
>> full sample (pooling across sex) and b) the sex composition of the sample
>> (e.g., % female). For a smaller number of studies of type B, we are able to
>> obtain dis-aggregated results for subgroups of male and female
>> participants. For these studies, we are able to determine a) the average
>> effect size for males and b) the average effect size for females, plus c)
>> the sex composition of each of the sub-samples (respectively 0% and 100%
>> female).
>>
>> Without considering within/between variation in the predictor, a
>> meta-regression testing for whether sex is a moderator is:
>>
>> Y_ij = b0 + b1 (% female)_ij + e_ij
>>
>> The coefficient b1 describes how effect size magnitude varies across
>> samples that differ by 1% in the percent of females. But the estimate of
>> this coefficient pools information across studies of type A and studies of
>> type B, essentially assuming that the contextual effects (variance
>> explained by sample composition) are the same as the individual-level
>> moderator effects (how the intervention effect varies between males and
>> females).
>>
>> Now, if we use the within/between decomposition, the meta-regression
>> becomes:
>>
>> Y_ij = b0 + b1 (% female-within)_ij + b2 (% female-between)_j + e_ij
>>
>> In this model, b1 will be estimated *using only the studies of type B*,
>> as an average of the moderator effects for the studies that provide
>> dis-aggregated data. And b2 will be estimated using studies of type A and
>> the study-level average % female in studies of type B. Thus b2 can be
>> interpreted as a pure contextual effect (variance explained by sample
>> composition). Why does this matter? It's because contextual effects usually
>> have a much murkier interpretation than individual-level moderator effects.
>> Maybe this particular intervention has been tested for several different
>> professions (e.g., education, nursing, dentistry, construction), and
>> professions that tend to have higher proportions of females are also those
>> that tend to be lower-status. If there is a positive contextual effect for
>> % female, then it might be that a) the intervention really is more
>> effective for females than for males or b) the intervention is equally
>> effective for males and females but tends to work better when used with
>> lower-status professions. Looking at between/within study variance in the
>> predictor lets us disentangle those possibilities, at least partially.
>>
>> James
>>
>> On Wed, Jun 3, 2020 at 9:27 AM Simon Harmel <sim.harmel using gmail.com> wrote:
>>
>>> Indeed that was the problem, Greta, Thanks.
>>>
>>> But James, in meta-analysis having multiple categorical variables each
>>> with several levels is very pervasive and they often vary both within and
>>> between studies.
>>>
>>> So, if for each level of each of such categorical variables we need to
>>> do this, this would certainly become a daunting task in addition to making
>>> the model extremely big.
>>>
>>> My follow-up question is what is your strategy after you create
>>> within and between dummies for each of such categorical variables? What are
>>> the next steps?
>>>
>>> Thank you very much, Simon
>>>
>>> p.s. After your `robu()` call I get: `Warning message: In
>>> sqrt(eigenval) : NaNs produced`
>>>
>>> On Wed, Jun 3, 2020 at 8:45 AM Gerta Ruecker <
>>> ruecker using imbi.uni-freiburg.de> wrote:
>>>
>>>> Simon
>>>>
>>>> Maybe there should not be a line break between "Relative and Rating"?
>>>>
>>>> For characters, for example if they are used as legends, line breaks
>>>> sometimes matter.
>>>>
>>>> Best,
>>>>
>>>> Gerta
>>>>
>>>> Am 03.06.2020 um 15:32 schrieb James Pustejovsky:
>>>> > I'm not sure what produced that error and I cannot reproduce it. It
>>>> may
>>>> > have to do something with the version of dplyr. Here's an alternative
>>>> way
>>>> > to recode the Scoring variable, which might be less prone to
>>>> versioning
>>>> > differences:
>>>> >
>>>> > library(dplyr)
>>>> > library(fastDummies)
>>>> > library(robumeta)
>>>> >
>>>> > data("oswald2013")
>>>> >
>>>> > oswald_centered <-
>>>> >    oswald2013 %>%
>>>> >
>>>> >    # make dummy variables
>>>> >    mutate(
>>>> >      Scoring = factor(Scoring,
>>>> >                       levels = c("Absolute", "Difference Score",
>>>> "Relative
>>>> > Rating"),
>>>> >                       labels = c("Absolute", "Difference",
>>>> "Relative"))
>>>> >    ) %>%
>>>> >    dummy_columns(select_columns = "Scoring") %>%
>>>> >
>>>> >    # centering by study
>>>> >    group_by(Study) %>%
>>>> >    mutate_at(vars(starts_with("Scoring_")),
>>>> >              list(wthn = ~ . - mean(.), btw = ~ mean(.))) %>%
>>>> >
>>>> >    # calculate Fisher Z and variance
>>>> >    mutate(
>>>> >      Z = atanh(R),
>>>> >      V = 1 / (N - 3)
>>>> >    )
>>>> >
>>>> >
>>>> > # Use the predictors in a meta-regression model
>>>> > # with Scoring = Absolute as the omitted category
>>>> >
>>>> > robu(Z ~ Scoring_Difference_wthn + Scoring_Relative_wthn +
>>>> >         Scoring_Difference_btw + Scoring_Relative_btw,
>>>> >       data = oswald_centered, studynum = Study, var.eff.size = V)
>>>> >
>>>> > On Tue, Jun 2, 2020 at 10:20 PM Simon Harmel <sim.harmel using gmail.com>
>>>> wrote:
>>>> >
>>>> >> Many thanks, James! I keep getting the following error when I run
>>>> your
>>>> >> code:
>>>> >>
>>>> >> Error: unexpected symbol in:
>>>> >> "Rating" = "Relative")
>>>> >> oswald_centered"
>>>> >>
>>>> >> On Tue, Jun 2, 2020 at 10:00 PM James Pustejovsky <jepusto using gmail.com
>>>> >
>>>> >> wrote:
>>>> >>
>>>> >>> Hi Simon,
>>>> >>>
>>>> >>> The same strategy can be followed by using dummy variables for each
>>>> >>> unique level of a categorical moderator. The idea would be to 1)
>>>> create
>>>> >>> dummy variables for each category, 2) calculate the study-level
>>>> means of
>>>> >>> the dummy variables (between-cluster predictors), and 3) calculate
>>>> the
>>>> >>> group-mean centered dummy variables (within-cluster predictors).
>>>> Just like
>>>> >>> if you're working with regular categorical predictors, you'll have
>>>> to pick
>>>> >>> one reference level to omit when using these sets of predictors.
>>>> >>>
>>>> >>> Here is an example of how to carry out such calculations in R,
>>>> using the
>>>> >>> fastDummies package along with a bit of dplyr:
>>>> >>>
>>>> >>> library(dplyr)
>>>> >>> library(fastDummies)
>>>> >>> library(robumeta)
>>>> >>>
>>>> >>> data("oswald2013")
>>>> >>>
>>>> >>> oswald_centered <-
>>>> >>>    oswald2013 %>%
>>>> >>>
>>>> >>>    # make dummy variables
>>>> >>>    mutate(
>>>> >>>      Scoring = recode(Scoring, "Difference Score" = "Difference",
>>>> >>> "Relative Rating" = "Relative")
>>>> >>>    ) %>%
>>>> >>>    dummy_columns(select_columns = "Scoring") %>%
>>>> >>>
>>>> >>>    # centering by study
>>>> >>>    group_by(Study) %>%
>>>> >>>    mutate_at(vars(starts_with("Scoring_")),
>>>> >>>              list(wthn = ~ . - mean(.), btw = ~ mean(.))) %>%
>>>> >>>
>>>> >>>    # calculate Fisher Z and variance
>>>> >>>    mutate(
>>>> >>>      Z = atanh(R),
>>>> >>>      V = 1 / (N - 3)
>>>> >>>    )
>>>> >>>
>>>> >>>
>>>> >>> # Use the predictors in a meta-regression model
>>>> >>> # with Scoring = Absolute as the omitted category
>>>> >>>
>>>> >>> robu(Z ~ Scoring_Difference_wthn + Scoring_Relative_wthn +
>>>> >>> Scoring_Difference_btw + Scoring_Relative_btw, data =
>>>> oswald_centered,
>>>> >>> studynum = Study, var.eff.size = V)
>>>> >>>
>>>> >>>
>>>> >>> Kind Regards,
>>>> >>> James
>>>> >>>
>>>> >>> On Tue, Jun 2, 2020 at 6:49 PM Simon Harmel <sim.harmel using gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>>> Hi All,
>>>> >>>>
>>>> >>>> Page 13 of *THIS ARTICLE
>>>> >>>> <
>>>> >>>>
>>>> https://cran.r-project.org/web/packages/robumeta/vignettes/robumetaVignette.pdf
>>>> >>>>> *
>>>> >>>>   (*top of the page*) recommends that if a *continuous moderator
>>>> *varies
>>>> >>>> both within and across studies in a meta-analysis, a strategy is
>>>> to break
>>>> >>>> that moderator down into two moderators by:
>>>> >>>>
>>>> >>>> *(a)* taking the mean of each study (between-cluster effect),
>>>> >>>>
>>>> >>>> *(b)* centering the predictor within each study (within-cluster
>>>> effect).
>>>> >>>>
>>>> >>>> BUT what if my original moderator that varies both within and
>>>> across
>>>> >>>> studies is a *"categorical" *moderator?
>>>> >>>>
>>>> >>>> I appreciate an R demonstration of the strategy recommended.
>>>> >>>> Thanks,
>>>> >>>> Simon
>>>> >>>>
>>>> >>>>          [[alternative HTML version deleted]]
>>>> >>>>
>>>> >>>> _______________________________________________
>>>> >>>> R-sig-meta-analysis mailing list
>>>> >>>> R-sig-meta-analysis using r-project.org
>>>> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>> >>>>
>>>> >       [[alternative HTML version deleted]]
>>>> >
>>>> > _______________________________________________
>>>> > R-sig-meta-analysis mailing list
>>>> > R-sig-meta-analysis using r-project.org
>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>
>>>> --
>>>>
>>>> Dr. rer. nat. Gerta Rücker, Dipl.-Math.
>>>>
>>>> Institute of Medical Biometry and Statistics,
>>>> Faculty of Medicine and Medical Center - University of Freiburg
>>>>
>>>> Stefan-Meier-Str. 26, D-79104 Freiburg, Germany
>>>>
>>>> Phone:    +49/761/203-6673
>>>> Fax:      +49/761/203-6680
>>>> Mail:     ruecker using imbi.uni-freiburg.de
>>>> Homepage: https://www.uniklinik-freiburg.de/imbi.html
>>>>
>>>>

	[[alternative HTML version deleted]]