# [R-meta] "Categorical" moderator varying within and between studies

Simon Harmel @|m@h@rme| @end|ng |rom gm@||@com
Thu Oct 29 18:24:25 CET 2020

```Dear James,

This makes perfect sense, many thanks. However, one thing remains. I know
the contextual effect coefficient is "b_btw - b_wthn". If we have two
categories (as in the case of "gender") and take females as the
reference category, then the contextual effect coefficient will be:

gender_M_btw  - gender_M_wthn

But if we have more than two categories (say we add a third "gender"
category called OTHER), then will the contextual effect coefficient be (sum
of the betweens) - (sum of the withins)?

(gender_M_btw + gender_OTHER_btw)  - (gender_M_wthn  +
gender_OTHER_wthn)

On Thu, Oct 29, 2020 at 9:44 AM James Pustejovsky <jepusto using gmail.com> wrote:

> Hi Simon,
>
> With a binary or categorical predictor, one could operationalize the
> contextual effect in terms of proportions (0-1 scale) or percentages (0-100
> scale). If proportions, like say proportion of vegetarians, then the
> contextual effect would be the average difference in the DV between two
> units who are both vegetarian (i.e., have the same value of the predictor),
> but belong to clusters that are all vegetarian versus all omnivorous (i.e.,
> that differ by one unit in the proportion for that predictor). That will
> make the contextual effects look quite large because it's an extreme
> comparison--absurdly so, in this case, since there can't be a vegetarian in
> a cluster of all omnivores.
>
> If you operationalize the contextual effect in terms of percentages (e.g.,
> % vegetarians) then you get the average difference in the DV between two
> units who are both vegetarian, but belong to clusters that differ by 1
> percentage point in the proportion of vegetarians.
>
> All of this works for multi-category predictors also. Say that you had
> vegetarians, pescatarians, and omnivores, with omnivores as the reference
> category, then the model would include group-mean-centered dummy variables
> for vegetarians and pescatarians, plus group-mean predictors representing
> the proportion/percentage of vegetarians and proportion/percentage of
> pescatarians. You have to omit one category at each level to avoid
> collinearity with the intercept.
>
> James
>
> On Thu, Oct 29, 2020 at 1:32 AM Simon Harmel <sim.harmel using gmail.com> wrote:
>
>> Dear James,
>>
>> I'm returning to this after a while, a quick question. In your gender
>> example, you used the term "%female" in your interpretation of the
>> contextual effect. If the categorical predictor had more than 2 categories,
>> then would you still use the term % in your interpretation?
>>
>> My understanding of contextual effect is below:
>>
>> Contextual effect is the average difference in the DV between two units
>> (e.g., subjects) which have the same value on an IV (e.g., same gender),
>> but belong to clusters (e.g., schools) whose mean/percentage on that IV
>> differs by one unit  (is unit percentage if IV is categorical?).
>>
>> Thank you, Simon
>>
>>
>>
>> On Sun, Jun 7, 2020 at 7:30 AM James Pustejovsky <jepusto using gmail.com>
>> wrote:
>>
>>> Yes, it’s general and also applies outside the context of meta-analysis.
>>> See for example Raudenbush & Bryk (2002) for a good discussion on centering
>>> and contextual effects in hierarchical linear models.
>>>
>>> On Jun 6, 2020, at 11:07 PM, Simon Harmel <sim.harmel using gmail.com> wrote:
>>>
>>> Many thanks James. A quick follow-up. The strategy that you described is
>>> a general, regression modeling strategy, right? I mean even if we were
>>> fitting a multi-level model, the fixed-effects part of the formula had to
>>> include the same construction of (i.e., *b1 (% female-within)_ij + b2
>>> (% female-between)_j*) in it?
>>>
>>> Thanks,
>>> Simon
>>>
>>> On Thu, Jun 4, 2020 at 9:42 AM James Pustejovsky <jepusto using gmail.com>
>>> wrote:
>>>
>>>> Hi Simon,
>>>>
>>>> Please keep the listserv cc'd so that others can benefit from these
>>>> discussions.
>>>>
>>>> Unfortunately, I don't think there is any single answer to your
>>>> question---analytic strategies just depend too much on what your research
>>>> questions are and the substantive context that you're working in.
>>>>
>>>> But speaking generally, the advantages of splitting predictors into
>>>> within- and between-study versions are two-fold. First is that doing this
>>>> provides an understanding of the structure of the data you're working with,
>>>> in that it forces one to consider *which* predictors have within-study
>>>> variation and *how much *variation there is (e.g., perhaps many
>>>> studies have looked at internalizing symptoms, many studies have looked at
>>>> externalizing symptoms, but only a few have looked at both types of
>>>> outcomes in the same sample). The second advantage is that within-study
>>>> predictors have a distinct interpretation from between-study predictors,
>>>> and the within-study version is often theoretically more
>>>> interesting/salient. That's because comparisons of effect sizes based on
>>>> within-study variation hold constant other aspects of the studies that
>>>> could influence effect size (and that could muddy the interpretation of the
>>>> moderator).
>>>>
>>>> Here is an example that comes up often in research synthesis projects.
>>>> Suppose that you're interested in whether participant sex moderates the
>>>> effect of some intervention. Most of the studies in the sample are of type
>>>> A, such that only aggregated effect sizes can be calculated. For these type
>>>> A studies, we are able to determine a) the average effect size across the
>>>> full sample (pooling across sex) and b) the sex composition of the sample
>>>> (e.g., % female). For a smaller number of studies of type B, we are able to
>>>> obtain dis-aggregated results for subgroups of male and female
>>>> participants. For these studies, we are able to determine a) the average
>>>> effect size for males and b) the average effect size for females, plus c)
>>>> the sex composition of each of the sub-samples (respectively 0% and 100%
>>>> female).
>>>>
>>>> Without considering within/between variation in the predictor, a
>>>> meta-regression testing for whether sex is a moderator is:
>>>>
>>>> Y_ij = b0 + b1 (% female)_ij + e_ij
>>>>
>>>> The coefficient b1 describes how effect size magnitude varies across
>>>> samples that differ by 1% in the percent of females. But the estimate of
>>>> this coefficient pools information across studies of type A and studies of
>>>> type B, essentially assuming that the contextual effects (variance
>>>> explained by sample composition) are the same as the individual-level
>>>> moderator effects (how the intervention effect varies between males and
>>>> females).
>>>>
>>>> Now, if we use the within/between decomposition, the meta-regression
>>>> becomes:
>>>>
>>>> Y_ij = b0 + b1 (% female-within)_ij + b2 (% female-between)_j + e_ij
>>>>
>>>> In this model, b1 will be estimated *using only the studies of type B*,
>>>> as an average of the moderator effects for the studies that provide
>>>> dis-aggregated data. And b2 will be estimated using studies of type A and
>>>> the study-level average % female in studies of type B. Thus b2 can be
>>>> interpreted as a pure contextual effect (variance explained by sample
>>>> composition). Why does this matter? It's because contextual effects usually
>>>> have a much murkier interpretation than individual-level moderator effects.
>>>> Maybe this particular intervention has been tested for several different
>>>> professions (e.g., education, nursing, dentistry, construction), and
>>>> professions that tend to have higher proportions of females are also those
>>>> that tend to be lower-status. If there is a positive contextual effect for
>>>> % female, then it might be that a) the intervention really is more
>>>> effective for females than for males or b) the intervention is equally
>>>> effective for males and females but tends to work better when used with
>>>> lower-status professions. Looking at between/within study variance in the
>>>> predictor lets us disentangle those possibilities, at least partially.
>>>>
>>>> James
>>>>
>>>> On Wed, Jun 3, 2020 at 9:27 AM Simon Harmel <sim.harmel using gmail.com>
>>>> wrote:
>>>>
>>>>> Indeed that was the problem, Greta, Thanks.
>>>>>
>>>>> But James, in meta-analysis having multiple categorical variables each
>>>>> with several levels is very pervasive and they often vary both within and
>>>>> between studies.
>>>>>
>>>>> So, if for each level of each of such categorical variables we need to
>>>>> do this, this would certainly become a daunting task in addition to making
>>>>> the model extremely big.
>>>>>
>>>>> My follow-up question is what is your strategy after you create
>>>>> within and between dummies for each of such categorical variables? What are
>>>>> the next steps?
>>>>>
>>>>> Thank you very much, Simon
>>>>>
>>>>> p.s. After your `robu()` call I get: `Warning message: In
>>>>> sqrt(eigenval) : NaNs produced`
>>>>>
>>>>> On Wed, Jun 3, 2020 at 8:45 AM Gerta Ruecker <
>>>>> ruecker using imbi.uni-freiburg.de> wrote:
>>>>>
>>>>>> Simon
>>>>>>
>>>>>> Maybe there should not be a line break between "Relative and Rating"?
>>>>>>
>>>>>> For characters, for example if they are used as legends, line breaks
>>>>>> sometimes matter.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Gerta
>>>>>>
>>>>>> Am 03.06.2020 um 15:32 schrieb James Pustejovsky:
>>>>>> > I'm not sure what produced that error and I cannot reproduce it. It
>>>>>> may
>>>>>> > have to do something with the version of dplyr. Here's an
>>>>>> alternative way
>>>>>> > to recode the Scoring variable, which might be less prone to
>>>>>> versioning
>>>>>> > differences:
>>>>>> >
>>>>>> > library(dplyr)
>>>>>> > library(fastDummies)
>>>>>> > library(robumeta)
>>>>>> >
>>>>>> > data("oswald2013")
>>>>>> >
>>>>>> > oswald_centered <-
>>>>>> >    oswald2013 %>%
>>>>>> >
>>>>>> >    # make dummy variables
>>>>>> >    mutate(
>>>>>> >      Scoring = factor(Scoring,
>>>>>> >                       levels = c("Absolute", "Difference Score",
>>>>>> "Relative
>>>>>> > Rating"),
>>>>>> >                       labels = c("Absolute", "Difference",
>>>>>> "Relative"))
>>>>>> >    ) %>%
>>>>>> >    dummy_columns(select_columns = "Scoring") %>%
>>>>>> >
>>>>>> >    # centering by study
>>>>>> >    group_by(Study) %>%
>>>>>> >    mutate_at(vars(starts_with("Scoring_")),
>>>>>> >              list(wthn = ~ . - mean(.), btw = ~ mean(.))) %>%
>>>>>> >
>>>>>> >    # calculate Fisher Z and variance
>>>>>> >    mutate(
>>>>>> >      Z = atanh(R),
>>>>>> >      V = 1 / (N - 3)
>>>>>> >    )
>>>>>> >
>>>>>> >
>>>>>> > # Use the predictors in a meta-regression model
>>>>>> > # with Scoring = Absolute as the omitted category
>>>>>> >
>>>>>> > robu(Z ~ Scoring_Difference_wthn + Scoring_Relative_wthn +
>>>>>> >         Scoring_Difference_btw + Scoring_Relative_btw,
>>>>>> >       data = oswald_centered, studynum = Study, var.eff.size = V)
>>>>>> >
>>>>>> > On Tue, Jun 2, 2020 at 10:20 PM Simon Harmel <sim.harmel using gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> >> Many thanks, James! I keep getting the following error when I run
>>>>>> your
>>>>>> >> code:
>>>>>> >>
>>>>>> >> Error: unexpected symbol in:
>>>>>> >> "Rating" = "Relative")
>>>>>> >> oswald_centered"
>>>>>> >>
>>>>>> >> On Tue, Jun 2, 2020 at 10:00 PM James Pustejovsky <
>>>>>> jepusto using gmail.com>
>>>>>> >> wrote:
>>>>>> >>
>>>>>> >>> Hi Simon,
>>>>>> >>>
>>>>>> >>> The same strategy can be followed by using dummy variables for
>>>>>> each
>>>>>> >>> unique level of a categorical moderator. The idea would be to 1)
>>>>>> create
>>>>>> >>> dummy variables for each category, 2) calculate the study-level
>>>>>> means of
>>>>>> >>> the dummy variables (between-cluster predictors), and 3)
>>>>>> calculate the
>>>>>> >>> group-mean centered dummy variables (within-cluster predictors).
>>>>>> Just like
>>>>>> >>> if you're working with regular categorical predictors, you'll
>>>>>> have to pick
>>>>>> >>> one reference level to omit when using these sets of predictors.
>>>>>> >>>
>>>>>> >>> Here is an example of how to carry out such calculations in R,
>>>>>> using the
>>>>>> >>> fastDummies package along with a bit of dplyr:
>>>>>> >>>
>>>>>> >>> library(dplyr)
>>>>>> >>> library(fastDummies)
>>>>>> >>> library(robumeta)
>>>>>> >>>
>>>>>> >>> data("oswald2013")
>>>>>> >>>
>>>>>> >>> oswald_centered <-
>>>>>> >>>    oswald2013 %>%
>>>>>> >>>
>>>>>> >>>    # make dummy variables
>>>>>> >>>    mutate(
>>>>>> >>>      Scoring = recode(Scoring, "Difference Score" = "Difference",
>>>>>> >>> "Relative Rating" = "Relative")
>>>>>> >>>    ) %>%
>>>>>> >>>    dummy_columns(select_columns = "Scoring") %>%
>>>>>> >>>
>>>>>> >>>    # centering by study
>>>>>> >>>    group_by(Study) %>%
>>>>>> >>>    mutate_at(vars(starts_with("Scoring_")),
>>>>>> >>>              list(wthn = ~ . - mean(.), btw = ~ mean(.))) %>%
>>>>>> >>>
>>>>>> >>>    # calculate Fisher Z and variance
>>>>>> >>>    mutate(
>>>>>> >>>      Z = atanh(R),
>>>>>> >>>      V = 1 / (N - 3)
>>>>>> >>>    )
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> # Use the predictors in a meta-regression model
>>>>>> >>> # with Scoring = Absolute as the omitted category
>>>>>> >>>
>>>>>> >>> robu(Z ~ Scoring_Difference_wthn + Scoring_Relative_wthn +
>>>>>> >>> Scoring_Difference_btw + Scoring_Relative_btw, data =
>>>>>> oswald_centered,
>>>>>> >>> studynum = Study, var.eff.size = V)
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Kind Regards,
>>>>>> >>> James
>>>>>> >>>
>>>>>> >>> On Tue, Jun 2, 2020 at 6:49 PM Simon Harmel <sim.harmel using gmail.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>>> Hi All,
>>>>>> >>>>
>>>>>> >>>> <
>>>>>> >>>>
>>>>>> https://cran.r-project.org/web/packages/robumeta/vignettes/robumetaVignette.pdf
>>>>>> >>>>> *
>>>>>> >>>>   (*top of the page*) recommends that if a *continuous moderator
>>>>>> *varies
>>>>>> >>>> both within and across studies in a meta-analysis, a strategy is
>>>>>> to break
>>>>>> >>>> that moderator down into two moderators by:
>>>>>> >>>>
>>>>>> >>>> *(a)* taking the mean of each study (between-cluster effect),
>>>>>> >>>>
>>>>>> >>>> *(b)* centering the predictor within each study (within-cluster
>>>>>> effect).
>>>>>> >>>>
>>>>>> >>>> BUT what if my original moderator that varies both within and
>>>>>> across
>>>>>> >>>> studies is a *"categorical" *moderator?
>>>>>> >>>>
>>>>>> >>>> I appreciate an R demonstration of the strategy recommended.
>>>>>> >>>> Thanks,
>>>>>> >>>> Simon
>>>>>> >>>>
>>>>>> >>>>          [[alternative HTML version deleted]]
>>>>>> >>>>
>>>>>> >>>> _______________________________________________
>>>>>> >>>> R-sig-meta-analysis mailing list
>>>>>> >>>> R-sig-meta-analysis using r-project.org
>>>>>> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>>> >>>>
>>>>>> >       [[alternative HTML version deleted]]
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > R-sig-meta-analysis mailing list
>>>>>> > R-sig-meta-analysis using r-project.org
>>>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Dr. rer. nat. Gerta Rücker, Dipl.-Math.
>>>>>>
>>>>>> Institute of Medical Biometry and Statistics,
>>>>>> Faculty of Medicine and Medical Center - University of Freiburg
>>>>>>
>>>>>> Stefan-Meier-Str. 26, D-79104 Freiburg, Germany
>>>>>>
>>>>>> Phone:    +49/761/203-6673
>>>>>> Fax:      +49/761/203-6680
>>>>>> Mail:     ruecker using imbi.uni-freiburg.de
>>>>>> Homepage: https://www.uniklinik-freiburg.de/imbi.html
>>>>>>
>>>>>>

[[alternative HTML version deleted]]

```