[R-meta] "Categorical" moderator varying within and between studies

Thu Oct 29 20:09:30 CET 2020

My apologies! I had this backwards in my head. Revised explanation below:

With gender, if you include the group-mean-centered dummy variables and the
cluster-level means, then the contextual effect will be as you described
(gender_M_btw - gender_M_wthn). However, another approach would be to leave
the dummy variables uncentered. If you do it this way, then the coefficient
on gender_M_btw corresponds exactly to the contextual effect, with no need
to subtract out the coefficient on gender_M_wthn.

R code verifying the equivalence of these approaches:

library(dplyr)
library(fastDummies)
library(lme4)

hsb <- read.csv("
https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv")

hsb2 <- hsb %>%
  mutate(gender = ifelse(female==0,"M","F")) %>%   # create 'gender’ from
variable ‘female’
  dummy_columns(select_columns = "gender") %>%     # create dummies for
'gender’ (creates 2 but we need 1)
  group_by(sch.id) %>%                             # group by cluster id '
sch.id'
  mutate(across(starts_with("gender_"), list(wthn = ~ . - mean(.), btw = ~
mean(.))))

mg_b_w <- lmer(math ~ gender_M_wthn + gender_M_btw + (1|sch.id), data =
hsb2)

mg_b_d <- lmer(math ~ gender_M + gender_M_btw + (1|sch.id), data = hsb2)

fixef(mg_b_w)[["gender_M_btw"]] - fixef(mg_b_w)[["gender_M_wthn"]]
fixef(mg_b_d)[["gender_M_btw"]]

On Thu, Oct 29, 2020 at 1:57 PM Simon Harmel <sim.harmel using gmail.com> wrote:

> Thank you, James. For uniformity, I always (i.e., for both categorical &
> numeric predictors) use the following method (using a dataset I found on
> Stack Overflow).
>
> So, in the case below, you're saying  gender_M_btw is the contextual
> effect itself?
>
> Simon
>
> library(dplyr)
> library(fastDummies)
> library(lme4)
>
> hsb <- read.csv("
> https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv")
>
> hsb2 <- hsb %>%
> mutate(gender = ifelse(female==0,"M","F")) %>%   # create 'gender’ from
> variable ‘female’
> dummy_columns(select_columns = "gender") %>%     # create dummies for
> 'gender’ (creates 2 but we need 1)
> group_by(sch.id) %>%                             # group by cluster id '
> sch.id'
> mutate(across(starts_with("gender_"), list(wthn = ~ . - mean(.), btw = ~
> mean(.))))
>
> mg_b_w <- lmer(math ~ gender_M_wthn + gender_M_btw + (1|sch.id), data =
> hsb2)
>
> On Thu, Oct 29, 2020 at 1:31 PM James Pustejovsky <jepusto using gmail.com>
> wrote:
>
>> Hi Simon,
>>
>> There are different ways to parameterize contextual effects. With gender,
>> if you include the regular dummy variables (without group-mean-centering)
>> plus the cluster-level means, then the contextual effect will be as you
>> described (gender_M_btw - gender_M_wthn). However, another approach would
>> be to first group-mean-center the dummy variables. In this approach, for a
>> male student, gender_M_wthn would be equal to 1 minus the proportion of
>> male students in the cluster, and for a female student, gender_M_wthn would
>> be equal to the negative of the proportion of male students in the cluster.
>> If you do it this way, then the coefficient on gender_M_btw corresponds
>> exactly to the contextual effect, with no need to subtract out the
>> coefficient on gender_M_wthn.
>>
>> All that said, if you have more than two categories you will have more
>> than one contextual effect. In your example, you have a contextual effect
>> for M, which would be the average difference in the DV between two units
>> who are both male, but belong to clusters that differ by 1 percentage point
>> in the composition of males *and have the same proportion of
>> other-gender students *(i.e., clusters that have 1 percentage point
>> difference in males, and a -1 percentage point difference in females). And
>> then you have a contextual effect for other, corresponding the average
>> difference in the DV between two units who are both other-gender, but
>> belong to clusters that differ by 1 percentage point in the composition of
>> other *and have the same proportion of male-gender students *(i.e.,
>> clusters that have 1 percentage point difference in other, and a -1
>> percentage point difference in females).
>>
>> James
>>
>> On Thu, Oct 29, 2020 at 12:24 PM Simon Harmel <sim.harmel using gmail.com>
>> wrote:
>>
>>> Dear James,
>>>
>>> This makes perfect sense, many thanks. However, one thing remains. I
>>> know the contextual effect coefficient is "b_btw - b_wthn". If we have two
>>> categories (as in the case of "gender") and take females as the
>>> reference category, then the contextual effect coefficient will be:
>>>
>>> gender_M_btw  - gender_M_wthn
>>>
>>> But if we have more than two categories (say we add a third "gender"
>>> category called OTHER), then will the contextual effect coefficient be (sum
>>> of the betweens) - (sum of the withins)?
>>>
>>>   (gender_M_btw + gender_OTHER_btw)  - (gender_M_wthn  +
>>> gender_OTHER_wthn)
>>>
>>>
>>>
>>> On Thu, Oct 29, 2020 at 9:44 AM James Pustejovsky <jepusto using gmail.com>
>>> wrote:
>>>
>>>> Hi Simon,
>>>>
>>>> With a binary or categorical predictor, one could operationalize the
>>>> contextual effect in terms of proportions (0-1 scale) or percentages (0-100
>>>> scale). If proportions, like say proportion of vegetarians, then the
>>>> contextual effect would be the average difference in the DV between two
>>>> units who are both vegetarian (i.e., have the same value of the predictor),
>>>> but belong to clusters that are all vegetarian versus all omnivorous (i.e.,
>>>> that differ by one unit in the proportion for that predictor). That will
>>>> make the contextual effects look quite large because it's an extreme
>>>> comparison--absurdly so, in this case, since there can't be a vegetarian in
>>>> a cluster of all omnivores.
>>>>
>>>> If you operationalize the contextual effect in terms of percentages
>>>> (e.g., % vegetarians) then you get the average difference in the DV
>>>> between two units who are both vegetarian, but belong to clusters that
>>>> differ by 1 percentage point in the proportion of vegetarians.
>>>>
>>>> All of this works for multi-category predictors also. Say that you had
>>>> vegetarians, pescatarians, and omnivores, with omnivores as the reference
>>>> category, then the model would include group-mean-centered dummy variables
>>>> for vegetarians and pescatarians, plus group-mean predictors representing
>>>> the proportion/percentage of vegetarians and proportion/percentage of
>>>> pescatarians. You have to omit one category at each level to avoid
>>>> collinearity with the intercept.
>>>>
>>>> James
>>>>
>>>> On Thu, Oct 29, 2020 at 1:32 AM Simon Harmel <sim.harmel using gmail.com>
>>>> wrote:
>>>>
>>>>> Dear James,
>>>>>
>>>>> I'm returning to this after a while, a quick question. In your gender
>>>>> example, you used the term "%female" in your interpretation of the
>>>>> contextual effect. If the categorical predictor had more than 2 categories,
>>>>> then would you still use the term % in your interpretation?
>>>>>
>>>>> My understanding of contextual effect is below:
>>>>>
>>>>> Contextual effect is the average difference in the DV between two
>>>>> units (e.g., subjects) which have the same value on an IV (e.g., same
>>>>> gender), but belong to clusters (e.g., schools) whose mean/percentage on
>>>>> that IV differs by one unit  (is unit percentage if IV is categorical?).
>>>>>
>>>>> Thank you, Simon
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Jun 7, 2020 at 7:30 AM James Pustejovsky <jepusto using gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes, it’s general and also applies outside the context of
>>>>>> meta-analysis. See for example Raudenbush & Bryk (2002) for a good
>>>>>> discussion on centering and contextual effects in hierarchical linear
>>>>>> models.
>>>>>>
>>>>>> On Jun 6, 2020, at 11:07 PM, Simon Harmel <sim.harmel using gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Many thanks James. A quick follow-up. The strategy that you described
>>>>>> is a general, regression modeling strategy, right? I mean even if we were
>>>>>> fitting a multi-level model, the fixed-effects part of the formula had to
>>>>>> include the same construction of (i.e., *b1 (% female-within)_ij +
>>>>>> b2 (% female-between)_j*) in it?
>>>>>>
>>>>>> Thanks,
>>>>>> Simon
>>>>>>
>>>>>> On Thu, Jun 4, 2020 at 9:42 AM James Pustejovsky <jepusto using gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Simon,
>>>>>>>
>>>>>>> Please keep the listserv cc'd so that others can benefit from these
>>>>>>> discussions.
>>>>>>>
>>>>>>> Unfortunately, I don't think there is any single answer to your
>>>>>>> question---analytic strategies just depend too much on what your research
>>>>>>> questions are and the substantive context that you're working in.
>>>>>>>
>>>>>>> But speaking generally, the advantages of splitting predictors into
>>>>>>> within- and between-study versions are two-fold. First is that doing this
>>>>>>> provides an understanding of the structure of the data you're working with,
>>>>>>> in that it forces one to consider *which* predictors have
>>>>>>> within-study variation and *how much *variation there is (e.g.,
>>>>>>> perhaps many studies have looked at internalizing symptoms, many studies
>>>>>>> have looked at externalizing symptoms, but only a few have looked at both
>>>>>>> types of outcomes in the same sample). The second advantage is that
>>>>>>> within-study predictors have a distinct interpretation from between-study
>>>>>>> predictors, and the within-study version is often theoretically more
>>>>>>> interesting/salient. That's because comparisons of effect sizes based on
>>>>>>> within-study variation hold constant other aspects of the studies that
>>>>>>> could influence effect size (and that could muddy the interpretation of the
>>>>>>> moderator).
>>>>>>>
>>>>>>> Here is an example that comes up often in research synthesis
>>>>>>> projects. Suppose that you're interested in whether participant sex
>>>>>>> moderates the effect of some intervention. Most of the studies in the
>>>>>>> sample are of type A, such that only aggregated effect sizes can be
>>>>>>> calculated. For these type A studies, we are able to determine a) the
>>>>>>> average effect size across the full sample (pooling across sex) and b) the
>>>>>>> sex composition of the sample (e.g., % female). For a smaller number of
>>>>>>> studies of type B, we are able to obtain dis-aggregated results for
>>>>>>> subgroups of male and female participants. For these studies, we are able
>>>>>>> to determine a) the average effect size for males and b) the average effect
>>>>>>> size for females, plus c) the sex composition of each of the sub-samples
>>>>>>> (respectively 0% and 100% female).
>>>>>>>
>>>>>>> Without considering within/between variation in the predictor, a
>>>>>>> meta-regression testing for whether sex is a moderator is:
>>>>>>>
>>>>>>> Y_ij = b0 + b1 (% female)_ij + e_ij
>>>>>>>
>>>>>>> The coefficient b1 describes how effect size magnitude varies across
>>>>>>> samples that differ by 1% in the percent of females. But the estimate of
>>>>>>> this coefficient pools information across studies of type A and studies of
>>>>>>> type B, essentially assuming that the contextual effects (variance
>>>>>>> explained by sample composition) are the same as the individual-level
>>>>>>> moderator effects (how the intervention effect varies between males and
>>>>>>> females).
>>>>>>>
>>>>>>> Now, if we use the within/between decomposition, the meta-regression
>>>>>>> becomes:
>>>>>>>
>>>>>>> Y_ij = b0 + b1 (% female-within)_ij + b2 (% female-between)_j + e_ij
>>>>>>>
>>>>>>> In this model, b1 will be estimated *using only the studies of type
>>>>>>> B*, as an average of the moderator effects for the studies that
>>>>>>> provide dis-aggregated data. And b2 will be estimated using studies of type
>>>>>>> A and the study-level average % female in studies of type B. Thus b2 can be
>>>>>>> interpreted as a pure contextual effect (variance explained by sample
>>>>>>> composition). Why does this matter? It's because contextual effects usually
>>>>>>> have a much murkier interpretation than individual-level moderator effects.
>>>>>>> Maybe this particular intervention has been tested for several different
>>>>>>> professions (e.g., education, nursing, dentistry, construction), and
>>>>>>> professions that tend to have higher proportions of females are also those
>>>>>>> that tend to be lower-status. If there is a positive contextual effect for
>>>>>>> % female, then it might be that a) the intervention really is more
>>>>>>> effective for females than for males or b) the intervention is equally
>>>>>>> effective for males and females but tends to work better when used with
>>>>>>> lower-status professions. Looking at between/within study variance in the
>>>>>>> predictor lets us disentangle those possibilities, at least partially.
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>> On Wed, Jun 3, 2020 at 9:27 AM Simon Harmel <sim.harmel using gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Indeed that was the problem, Greta, Thanks.
>>>>>>>>
>>>>>>>> But James, in meta-analysis having multiple categorical variables
>>>>>>>> each with several levels is very pervasive and they often vary both
>>>>>>>> within and between studies.
>>>>>>>>
>>>>>>>> So, if for each level of each of such categorical variables we need
>>>>>>>> to do this, this would certainly become a daunting task in addition to
>>>>>>>> making the model extremely big.
>>>>>>>>
>>>>>>>> My follow-up question is what is your strategy after you create
>>>>>>>> within and between dummies for each of such categorical variables? What are
>>>>>>>> the next steps?
>>>>>>>>
>>>>>>>> Thank you very much, Simon
>>>>>>>>
>>>>>>>> p.s. After your `robu()` call I get: `Warning message: In
>>>>>>>> sqrt(eigenval) : NaNs produced`
>>>>>>>>
>>>>>>>> On Wed, Jun 3, 2020 at 8:45 AM Gerta Ruecker <
>>>>>>>> ruecker using imbi.uni-freiburg.de> wrote:
>>>>>>>>
>>>>>>>>> Simon
>>>>>>>>>
>>>>>>>>> Maybe there should not be a line break between "Relative and
>>>>>>>>> Rating"?
>>>>>>>>>
>>>>>>>>> For characters, for example if they are used as legends, line
>>>>>>>>> breaks
>>>>>>>>> sometimes matter.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Gerta
>>>>>>>>>
>>>>>>>>> Am 03.06.2020 um 15:32 schrieb James Pustejovsky:
>>>>>>>>> > I'm not sure what produced that error and I cannot reproduce it.
>>>>>>>>> It may
>>>>>>>>> > have to do something with the version of dplyr. Here's an
>>>>>>>>> alternative way
>>>>>>>>> > to recode the Scoring variable, which might be less prone to
>>>>>>>>> versioning
>>>>>>>>> > differences:
>>>>>>>>> >
>>>>>>>>> > library(dplyr)
>>>>>>>>> > library(fastDummies)
>>>>>>>>> > library(robumeta)
>>>>>>>>> >
>>>>>>>>> > data("oswald2013")
>>>>>>>>> >
>>>>>>>>> > oswald_centered <-
>>>>>>>>> >    oswald2013 %>%
>>>>>>>>> >
>>>>>>>>> >    # make dummy variables
>>>>>>>>> >    mutate(
>>>>>>>>> >      Scoring = factor(Scoring,
>>>>>>>>> >                       levels = c("Absolute", "Difference Score",
>>>>>>>>> "Relative
>>>>>>>>> > Rating"),
>>>>>>>>> >                       labels = c("Absolute", "Difference",
>>>>>>>>> "Relative"))
>>>>>>>>> >    ) %>%
>>>>>>>>> >    dummy_columns(select_columns = "Scoring") %>%
>>>>>>>>> >
>>>>>>>>> >    # centering by study
>>>>>>>>> >    group_by(Study) %>%
>>>>>>>>> >    mutate_at(vars(starts_with("Scoring_")),
>>>>>>>>> >              list(wthn = ~ . - mean(.), btw = ~ mean(.))) %>%
>>>>>>>>> >
>>>>>>>>> >    # calculate Fisher Z and variance
>>>>>>>>> >    mutate(
>>>>>>>>> >      Z = atanh(R),
>>>>>>>>> >      V = 1 / (N - 3)
>>>>>>>>> >    )
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > # Use the predictors in a meta-regression model
>>>>>>>>> > # with Scoring = Absolute as the omitted category
>>>>>>>>> >
>>>>>>>>> > robu(Z ~ Scoring_Difference_wthn + Scoring_Relative_wthn +
>>>>>>>>> >         Scoring_Difference_btw + Scoring_Relative_btw,
>>>>>>>>> >       data = oswald_centered, studynum = Study, var.eff.size = V)
>>>>>>>>> >
>>>>>>>>> > On Tue, Jun 2, 2020 at 10:20 PM Simon Harmel <
>>>>>>>>> sim.harmel using gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> >> Many thanks, James! I keep getting the following error when I
>>>>>>>>> run your
>>>>>>>>> >> code:
>>>>>>>>> >>
>>>>>>>>> >> Error: unexpected symbol in:
>>>>>>>>> >> "Rating" = "Relative")
>>>>>>>>> >> oswald_centered"
>>>>>>>>> >>
>>>>>>>>> >> On Tue, Jun 2, 2020 at 10:00 PM James Pustejovsky <
>>>>>>>>> jepusto using gmail.com>
>>>>>>>>> >> wrote:
>>>>>>>>> >>
>>>>>>>>> >>> Hi Simon,
>>>>>>>>> >>>
>>>>>>>>> >>> The same strategy can be followed by using dummy variables for
>>>>>>>>> each
>>>>>>>>> >>> unique level of a categorical moderator. The idea would be to
>>>>>>>>> 1) create
>>>>>>>>> >>> dummy variables for each category, 2) calculate the
>>>>>>>>> study-level means of
>>>>>>>>> >>> the dummy variables (between-cluster predictors), and 3)
>>>>>>>>> calculate the
>>>>>>>>> >>> group-mean centered dummy variables (within-cluster
>>>>>>>>> predictors). Just like
>>>>>>>>> >>> if you're working with regular categorical predictors, you'll
>>>>>>>>> have to pick
>>>>>>>>> >>> one reference level to omit when using these sets of
>>>>>>>>> predictors.
>>>>>>>>> >>>
>>>>>>>>> >>> Here is an example of how to carry out such calculations in R,
>>>>>>>>> using the
>>>>>>>>> >>> fastDummies package along with a bit of dplyr:
>>>>>>>>> >>>
>>>>>>>>> >>> library(dplyr)
>>>>>>>>> >>> library(fastDummies)
>>>>>>>>> >>> library(robumeta)
>>>>>>>>> >>>
>>>>>>>>> >>> data("oswald2013")
>>>>>>>>> >>>
>>>>>>>>> >>> oswald_centered <-
>>>>>>>>> >>>    oswald2013 %>%
>>>>>>>>> >>>
>>>>>>>>> >>>    # make dummy variables
>>>>>>>>> >>>    mutate(
>>>>>>>>> >>>      Scoring = recode(Scoring, "Difference Score" =
>>>>>>>>> "Difference",
>>>>>>>>> >>> "Relative Rating" = "Relative")
>>>>>>>>> >>>    ) %>%
>>>>>>>>> >>>    dummy_columns(select_columns = "Scoring") %>%
>>>>>>>>> >>>
>>>>>>>>> >>>    # centering by study
>>>>>>>>> >>>    group_by(Study) %>%
>>>>>>>>> >>>    mutate_at(vars(starts_with("Scoring_")),
>>>>>>>>> >>>              list(wthn = ~ . - mean(.), btw = ~ mean(.))) %>%
>>>>>>>>> >>>
>>>>>>>>> >>>    # calculate Fisher Z and variance
>>>>>>>>> >>>    mutate(
>>>>>>>>> >>>      Z = atanh(R),
>>>>>>>>> >>>      V = 1 / (N - 3)
>>>>>>>>> >>>    )
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> # Use the predictors in a meta-regression model
>>>>>>>>> >>> # with Scoring = Absolute as the omitted category
>>>>>>>>> >>>
>>>>>>>>> >>> robu(Z ~ Scoring_Difference_wthn + Scoring_Relative_wthn +
>>>>>>>>> >>> Scoring_Difference_btw + Scoring_Relative_btw, data =
>>>>>>>>> oswald_centered,
>>>>>>>>> >>> studynum = Study, var.eff.size = V)
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> Kind Regards,
>>>>>>>>> >>> James
>>>>>>>>> >>>
>>>>>>>>> >>> On Tue, Jun 2, 2020 at 6:49 PM Simon Harmel <
>>>>>>>>> sim.harmel using gmail.com> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>>> Hi All,
>>>>>>>>> >>>>
>>>>>>>>> >>>> Page 13 of *THIS ARTICLE
>>>>>>>>> >>>> <
>>>>>>>>> >>>>
>>>>>>>>> https://cran.r-project.org/web/packages/robumeta/vignettes/robumetaVignette.pdf
>>>>>>>>> >>>>> *
>>>>>>>>> >>>>   (*top of the page*) recommends that if a *continuous
>>>>>>>>> moderator *varies
>>>>>>>>> >>>> both within and across studies in a meta-analysis, a strategy
>>>>>>>>> is to break
>>>>>>>>> >>>> that moderator down into two moderators by:
>>>>>>>>> >>>>
>>>>>>>>> >>>> *(a)* taking the mean of each study (between-cluster effect),
>>>>>>>>> >>>>
>>>>>>>>> >>>> *(b)* centering the predictor within each study
>>>>>>>>> (within-cluster effect).
>>>>>>>>> >>>>
>>>>>>>>> >>>> BUT what if my original moderator that varies both within and
>>>>>>>>> across
>>>>>>>>> >>>> studies is a *"categorical" *moderator?
>>>>>>>>> >>>>
>>>>>>>>> >>>> I appreciate an R demonstration of the strategy recommended.
>>>>>>>>> >>>> Thanks,
>>>>>>>>> >>>> Simon
>>>>>>>>> >>>>
>>>>>>>>> >>>>          [[alternative HTML version deleted]]
>>>>>>>>> >>>>
>>>>>>>>> >>>> _______________________________________________
>>>>>>>>> >>>> R-sig-meta-analysis mailing list
>>>>>>>>> >>>> R-sig-meta-analysis using r-project.org
>>>>>>>>> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>>>>>> >>>>
>>>>>>>>> >       [[alternative HTML version deleted]]
>>>>>>>>> >
>>>>>>>>> > _______________________________________________
>>>>>>>>> > R-sig-meta-analysis mailing list
>>>>>>>>> > R-sig-meta-analysis using r-project.org
>>>>>>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Dr. rer. nat. Gerta Rücker, Dipl.-Math.
>>>>>>>>>
>>>>>>>>> Institute of Medical Biometry and Statistics,
>>>>>>>>> Faculty of Medicine and Medical Center - University of Freiburg
>>>>>>>>>
>>>>>>>>> Stefan-Meier-Str. 26, D-79104 Freiburg, Germany
>>>>>>>>>
>>>>>>>>> Phone:    +49/761/203-6673
>>>>>>>>> Fax:      +49/761/203-6680
>>>>>>>>> Mail:     ruecker using imbi.uni-freiburg.de
>>>>>>>>> Homepage: https://www.uniklinik-freiburg.de/imbi.html
>>>>>>>>>
>>>>>>>>>

	[[alternative HTML version deleted]]