[R-meta] "Categorical" moderator varying within and between studies

Simon Harmel @|m@h@rme| @end|ng |rom gm@||@com
Thu Oct 29 19:57:20 CET 2020


Thank you, James. For uniformity, I always (i.e., for both categorical &
numeric predictors) use the following method (using a dataset I found on
Stack Overflow).

So, in the case below, you're saying  gender_M_btw is the contextual effect
itself?

Simon

library(dplyr)
library(fastDummies)
library(lme4)

hsb <- read.csv("
https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv")

hsb2 <- hsb %>%
mutate(gender = ifelse(female==0,"M","F")) %>%   # create 'gender’ from
variable ‘female’
dummy_columns(select_columns = "gender") %>%     # create dummies for
'gender’ (creates 2 but we need 1)
group_by(sch.id) %>%                             # group by cluster id '
sch.id'
mutate(across(starts_with("gender_"), list(wthn = ~ . - mean(.), btw = ~
mean(.))))

mg_b_w <- lmer(math ~ gender_M_wthn + gender_M_btw + (1|sch.id), data =
hsb2)

On Thu, Oct 29, 2020 at 1:31 PM James Pustejovsky <jepusto using gmail.com> wrote:

> Hi Simon,
>
> There are different ways to parameterize contextual effects. With gender,
> if you include the regular dummy variables (without group-mean-centering)
> plus the cluster-level means, then the contextual effect will be as you
> described (gender_M_btw - gender_M_wthn). However, another approach would
> be to first group-mean-center the dummy variables. In this approach, for a
> male student, gender_M_wthn would be equal to 1 minus the proportion of
> male students in the cluster, and for a female student, gender_M_wthn would
> be equal to the negative of the proportion of male students in the cluster.
> If you do it this way, then the coefficient on gender_M_btw corresponds
> exactly to the contextual effect, with no need to subtract out the
> coefficient on gender_M_wthn.
>
> All that said, if you have more than two categories you will have more
> than one contextual effect. In your example, you have a contextual effect
> for M, which would be the average difference in the DV between two units
> who are both male, but belong to clusters that differ by 1 percentage point
> in the composition of males *and have the same proportion of other-gender
> students *(i.e., clusters that have 1 percentage point difference in
> males, and a -1 percentage point difference in females). And then you have
> a contextual effect for other, corresponding the average difference in the
> DV between two units who are both other-gender, but belong to clusters that
> differ by 1 percentage point in the composition of other *and have the
> same proportion of male-gender students *(i.e., clusters that have 1
> percentage point difference in other, and a -1 percentage point difference
> in females).
>
> James
>
> On Thu, Oct 29, 2020 at 12:24 PM Simon Harmel <sim.harmel using gmail.com>
> wrote:
>
>> Dear James,
>>
>> This makes perfect sense, many thanks. However, one thing remains. I know
>> the contextual effect coefficient is "b_btw - b_wthn". If we have two
>> categories (as in the case of "gender") and take females as the
>> reference category, then the contextual effect coefficient will be:
>>
>> gender_M_btw  - gender_M_wthn
>>
>> But if we have more than two categories (say we add a third "gender"
>> category called OTHER), then will the contextual effect coefficient be (sum
>> of the betweens) - (sum of the withins)?
>>
>>   (gender_M_btw + gender_OTHER_btw)  - (gender_M_wthn  +
>> gender_OTHER_wthn)
>>
>>
>>
>> On Thu, Oct 29, 2020 at 9:44 AM James Pustejovsky <jepusto using gmail.com>
>> wrote:
>>
>>> Hi Simon,
>>>
>>> With a binary or categorical predictor, one could operationalize the
>>> contextual effect in terms of proportions (0-1 scale) or percentages (0-100
>>> scale). If proportions, like say proportion of vegetarians, then the
>>> contextual effect would be the average difference in the DV between two
>>> units who are both vegetarian (i.e., have the same value of the predictor),
>>> but belong to clusters that are all vegetarian versus all omnivorous (i.e.,
>>> that differ by one unit in the proportion for that predictor). That will
>>> make the contextual effects look quite large because it's an extreme
>>> comparison--absurdly so, in this case, since there can't be a vegetarian in
>>> a cluster of all omnivores.
>>>
>>> If you operationalize the contextual effect in terms of percentages
>>> (e.g., % vegetarians) then you get the average difference in the DV
>>> between two units who are both vegetarian, but belong to clusters that
>>> differ by 1 percentage point in the proportion of vegetarians.
>>>
>>> All of this works for multi-category predictors also. Say that you had
>>> vegetarians, pescatarians, and omnivores, with omnivores as the reference
>>> category, then the model would include group-mean-centered dummy variables
>>> for vegetarians and pescatarians, plus group-mean predictors representing
>>> the proportion/percentage of vegetarians and proportion/percentage of
>>> pescatarians. You have to omit one category at each level to avoid
>>> collinearity with the intercept.
>>>
>>> James
>>>
>>> On Thu, Oct 29, 2020 at 1:32 AM Simon Harmel <sim.harmel using gmail.com>
>>> wrote:
>>>
>>>> Dear James,
>>>>
>>>> I'm returning to this after a while, a quick question. In your gender
>>>> example, you used the term "%female" in your interpretation of the
>>>> contextual effect. If the categorical predictor had more than 2 categories,
>>>> then would you still use the term % in your interpretation?
>>>>
>>>> My understanding of contextual effect is below:
>>>>
>>>> Contextual effect is the average difference in the DV between two units
>>>> (e.g., subjects) which have the same value on an IV (e.g., same gender),
>>>> but belong to clusters (e.g., schools) whose mean/percentage on that IV
>>>> differs by one unit  (is unit percentage if IV is categorical?).
>>>>
>>>> Thank you, Simon
>>>>
>>>>
>>>>
>>>> On Sun, Jun 7, 2020 at 7:30 AM James Pustejovsky <jepusto using gmail.com>
>>>> wrote:
>>>>
>>>>> Yes, it’s general and also applies outside the context of
>>>>> meta-analysis. See for example Raudenbush & Bryk (2002) for a good
>>>>> discussion on centering and contextual effects in hierarchical linear
>>>>> models.
>>>>>
>>>>> On Jun 6, 2020, at 11:07 PM, Simon Harmel <sim.harmel using gmail.com>
>>>>> wrote:
>>>>>
>>>>> Many thanks James. A quick follow-up. The strategy that you described
>>>>> is a general, regression modeling strategy, right? I mean even if we were
>>>>> fitting a multi-level model, the fixed-effects part of the formula had to
>>>>> include the same construction of (i.e., *b1 (% female-within)_ij + b2
>>>>> (% female-between)_j*) in it?
>>>>>
>>>>> Thanks,
>>>>> Simon
>>>>>
>>>>> On Thu, Jun 4, 2020 at 9:42 AM James Pustejovsky <jepusto using gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Simon,
>>>>>>
>>>>>> Please keep the listserv cc'd so that others can benefit from these
>>>>>> discussions.
>>>>>>
>>>>>> Unfortunately, I don't think there is any single answer to your
>>>>>> question---analytic strategies just depend too much on what your research
>>>>>> questions are and the substantive context that you're working in.
>>>>>>
>>>>>> But speaking generally, the advantages of splitting predictors into
>>>>>> within- and between-study versions are two-fold. First is that doing this
>>>>>> provides an understanding of the structure of the data you're working with,
>>>>>> in that it forces one to consider *which* predictors have
>>>>>> within-study variation and *how much *variation there is (e.g.,
>>>>>> perhaps many studies have looked at internalizing symptoms, many studies
>>>>>> have looked at externalizing symptoms, but only a few have looked at both
>>>>>> types of outcomes in the same sample). The second advantage is that
>>>>>> within-study predictors have a distinct interpretation from between-study
>>>>>> predictors, and the within-study version is often theoretically more
>>>>>> interesting/salient. That's because comparisons of effect sizes based on
>>>>>> within-study variation hold constant other aspects of the studies that
>>>>>> could influence effect size (and that could muddy the interpretation of the
>>>>>> moderator).
>>>>>>
>>>>>> Here is an example that comes up often in research synthesis
>>>>>> projects. Suppose that you're interested in whether participant sex
>>>>>> moderates the effect of some intervention. Most of the studies in the
>>>>>> sample are of type A, such that only aggregated effect sizes can be
>>>>>> calculated. For these type A studies, we are able to determine a) the
>>>>>> average effect size across the full sample (pooling across sex) and b) the
>>>>>> sex composition of the sample (e.g., % female). For a smaller number of
>>>>>> studies of type B, we are able to obtain dis-aggregated results for
>>>>>> subgroups of male and female participants. For these studies, we are able
>>>>>> to determine a) the average effect size for males and b) the average effect
>>>>>> size for females, plus c) the sex composition of each of the sub-samples
>>>>>> (respectively 0% and 100% female).
>>>>>>
>>>>>> Without considering within/between variation in the predictor, a
>>>>>> meta-regression testing for whether sex is a moderator is:
>>>>>>
>>>>>> Y_ij = b0 + b1 (% female)_ij + e_ij
>>>>>>
>>>>>> The coefficient b1 describes how effect size magnitude varies across
>>>>>> samples that differ by 1% in the percent of females. But the estimate of
>>>>>> this coefficient pools information across studies of type A and studies of
>>>>>> type B, essentially assuming that the contextual effects (variance
>>>>>> explained by sample composition) are the same as the individual-level
>>>>>> moderator effects (how the intervention effect varies between males and
>>>>>> females).
>>>>>>
>>>>>> Now, if we use the within/between decomposition, the meta-regression
>>>>>> becomes:
>>>>>>
>>>>>> Y_ij = b0 + b1 (% female-within)_ij + b2 (% female-between)_j + e_ij
>>>>>>
>>>>>> In this model, b1 will be estimated *using only the studies of type
>>>>>> B*, as an average of the moderator effects for the studies that
>>>>>> provide dis-aggregated data. And b2 will be estimated using studies of type
>>>>>> A and the study-level average % female in studies of type B. Thus b2 can be
>>>>>> interpreted as a pure contextual effect (variance explained by sample
>>>>>> composition). Why does this matter? It's because contextual effects usually
>>>>>> have a much murkier interpretation than individual-level moderator effects.
>>>>>> Maybe this particular intervention has been tested for several different
>>>>>> professions (e.g., education, nursing, dentistry, construction), and
>>>>>> professions that tend to have higher proportions of females are also those
>>>>>> that tend to be lower-status. If there is a positive contextual effect for
>>>>>> % female, then it might be that a) the intervention really is more
>>>>>> effective for females than for males or b) the intervention is equally
>>>>>> effective for males and females but tends to work better when used with
>>>>>> lower-status professions. Looking at between/within study variance in the
>>>>>> predictor lets us disentangle those possibilities, at least partially.
>>>>>>
>>>>>> James
>>>>>>
>>>>>> On Wed, Jun 3, 2020 at 9:27 AM Simon Harmel <sim.harmel using gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Indeed that was the problem, Greta, Thanks.
>>>>>>>
>>>>>>> But James, in meta-analysis having multiple categorical variables
>>>>>>> each with several levels is very pervasive and they often vary both
>>>>>>> within and between studies.
>>>>>>>
>>>>>>> So, if for each level of each of such categorical variables we need
>>>>>>> to do this, this would certainly become a daunting task in addition to
>>>>>>> making the model extremely big.
>>>>>>>
>>>>>>> My follow-up question is what is your strategy after you create
>>>>>>> within and between dummies for each of such categorical variables? What are
>>>>>>> the next steps?
>>>>>>>
>>>>>>> Thank you very much, Simon
>>>>>>>
>>>>>>> p.s. After your `robu()` call I get: `Warning message: In
>>>>>>> sqrt(eigenval) : NaNs produced`
>>>>>>>
>>>>>>> On Wed, Jun 3, 2020 at 8:45 AM Gerta Ruecker <
>>>>>>> ruecker using imbi.uni-freiburg.de> wrote:
>>>>>>>
>>>>>>>> Simon
>>>>>>>>
>>>>>>>> Maybe there should not be a line break between "Relative and
>>>>>>>> Rating"?
>>>>>>>>
>>>>>>>> For characters, for example if they are used as legends, line
>>>>>>>> breaks
>>>>>>>> sometimes matter.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Gerta
>>>>>>>>
>>>>>>>> Am 03.06.2020 um 15:32 schrieb James Pustejovsky:
>>>>>>>> > I'm not sure what produced that error and I cannot reproduce it.
>>>>>>>> It may
>>>>>>>> > have to do something with the version of dplyr. Here's an
>>>>>>>> alternative way
>>>>>>>> > to recode the Scoring variable, which might be less prone to
>>>>>>>> versioning
>>>>>>>> > differences:
>>>>>>>> >
>>>>>>>> > library(dplyr)
>>>>>>>> > library(fastDummies)
>>>>>>>> > library(robumeta)
>>>>>>>> >
>>>>>>>> > data("oswald2013")
>>>>>>>> >
>>>>>>>> > oswald_centered <-
>>>>>>>> >    oswald2013 %>%
>>>>>>>> >
>>>>>>>> >    # make dummy variables
>>>>>>>> >    mutate(
>>>>>>>> >      Scoring = factor(Scoring,
>>>>>>>> >                       levels = c("Absolute", "Difference Score",
>>>>>>>> "Relative
>>>>>>>> > Rating"),
>>>>>>>> >                       labels = c("Absolute", "Difference",
>>>>>>>> "Relative"))
>>>>>>>> >    ) %>%
>>>>>>>> >    dummy_columns(select_columns = "Scoring") %>%
>>>>>>>> >
>>>>>>>> >    # centering by study
>>>>>>>> >    group_by(Study) %>%
>>>>>>>> >    mutate_at(vars(starts_with("Scoring_")),
>>>>>>>> >              list(wthn = ~ . - mean(.), btw = ~ mean(.))) %>%
>>>>>>>> >
>>>>>>>> >    # calculate Fisher Z and variance
>>>>>>>> >    mutate(
>>>>>>>> >      Z = atanh(R),
>>>>>>>> >      V = 1 / (N - 3)
>>>>>>>> >    )
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > # Use the predictors in a meta-regression model
>>>>>>>> > # with Scoring = Absolute as the omitted category
>>>>>>>> >
>>>>>>>> > robu(Z ~ Scoring_Difference_wthn + Scoring_Relative_wthn +
>>>>>>>> >         Scoring_Difference_btw + Scoring_Relative_btw,
>>>>>>>> >       data = oswald_centered, studynum = Study, var.eff.size = V)
>>>>>>>> >
>>>>>>>> > On Tue, Jun 2, 2020 at 10:20 PM Simon Harmel <
>>>>>>>> sim.harmel using gmail.com> wrote:
>>>>>>>> >
>>>>>>>> >> Many thanks, James! I keep getting the following error when I
>>>>>>>> run your
>>>>>>>> >> code:
>>>>>>>> >>
>>>>>>>> >> Error: unexpected symbol in:
>>>>>>>> >> "Rating" = "Relative")
>>>>>>>> >> oswald_centered"
>>>>>>>> >>
>>>>>>>> >> On Tue, Jun 2, 2020 at 10:00 PM James Pustejovsky <
>>>>>>>> jepusto using gmail.com>
>>>>>>>> >> wrote:
>>>>>>>> >>
>>>>>>>> >>> Hi Simon,
>>>>>>>> >>>
>>>>>>>> >>> The same strategy can be followed by using dummy variables for
>>>>>>>> each
>>>>>>>> >>> unique level of a categorical moderator. The idea would be to
>>>>>>>> 1) create
>>>>>>>> >>> dummy variables for each category, 2) calculate the study-level
>>>>>>>> means of
>>>>>>>> >>> the dummy variables (between-cluster predictors), and 3)
>>>>>>>> calculate the
>>>>>>>> >>> group-mean centered dummy variables (within-cluster
>>>>>>>> predictors). Just like
>>>>>>>> >>> if you're working with regular categorical predictors, you'll
>>>>>>>> have to pick
>>>>>>>> >>> one reference level to omit when using these sets of predictors.
>>>>>>>> >>>
>>>>>>>> >>> Here is an example of how to carry out such calculations in R,
>>>>>>>> using the
>>>>>>>> >>> fastDummies package along with a bit of dplyr:
>>>>>>>> >>>
>>>>>>>> >>> library(dplyr)
>>>>>>>> >>> library(fastDummies)
>>>>>>>> >>> library(robumeta)
>>>>>>>> >>>
>>>>>>>> >>> data("oswald2013")
>>>>>>>> >>>
>>>>>>>> >>> oswald_centered <-
>>>>>>>> >>>    oswald2013 %>%
>>>>>>>> >>>
>>>>>>>> >>>    # make dummy variables
>>>>>>>> >>>    mutate(
>>>>>>>> >>>      Scoring = recode(Scoring, "Difference Score" =
>>>>>>>> "Difference",
>>>>>>>> >>> "Relative Rating" = "Relative")
>>>>>>>> >>>    ) %>%
>>>>>>>> >>>    dummy_columns(select_columns = "Scoring") %>%
>>>>>>>> >>>
>>>>>>>> >>>    # centering by study
>>>>>>>> >>>    group_by(Study) %>%
>>>>>>>> >>>    mutate_at(vars(starts_with("Scoring_")),
>>>>>>>> >>>              list(wthn = ~ . - mean(.), btw = ~ mean(.))) %>%
>>>>>>>> >>>
>>>>>>>> >>>    # calculate Fisher Z and variance
>>>>>>>> >>>    mutate(
>>>>>>>> >>>      Z = atanh(R),
>>>>>>>> >>>      V = 1 / (N - 3)
>>>>>>>> >>>    )
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> # Use the predictors in a meta-regression model
>>>>>>>> >>> # with Scoring = Absolute as the omitted category
>>>>>>>> >>>
>>>>>>>> >>> robu(Z ~ Scoring_Difference_wthn + Scoring_Relative_wthn +
>>>>>>>> >>> Scoring_Difference_btw + Scoring_Relative_btw, data =
>>>>>>>> oswald_centered,
>>>>>>>> >>> studynum = Study, var.eff.size = V)
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> Kind Regards,
>>>>>>>> >>> James
>>>>>>>> >>>
>>>>>>>> >>> On Tue, Jun 2, 2020 at 6:49 PM Simon Harmel <
>>>>>>>> sim.harmel using gmail.com> wrote:
>>>>>>>> >>>
>>>>>>>> >>>> Hi All,
>>>>>>>> >>>>
>>>>>>>> >>>> Page 13 of *THIS ARTICLE
>>>>>>>> >>>> <
>>>>>>>> >>>>
>>>>>>>> https://cran.r-project.org/web/packages/robumeta/vignettes/robumetaVignette.pdf
>>>>>>>> >>>>> *
>>>>>>>> >>>>   (*top of the page*) recommends that if a *continuous
>>>>>>>> moderator *varies
>>>>>>>> >>>> both within and across studies in a meta-analysis, a strategy
>>>>>>>> is to break
>>>>>>>> >>>> that moderator down into two moderators by:
>>>>>>>> >>>>
>>>>>>>> >>>> *(a)* taking the mean of each study (between-cluster effect),
>>>>>>>> >>>>
>>>>>>>> >>>> *(b)* centering the predictor within each study
>>>>>>>> (within-cluster effect).
>>>>>>>> >>>>
>>>>>>>> >>>> BUT what if my original moderator that varies both within and
>>>>>>>> across
>>>>>>>> >>>> studies is a *"categorical" *moderator?
>>>>>>>> >>>>
>>>>>>>> >>>> I appreciate an R demonstration of the strategy recommended.
>>>>>>>> >>>> Thanks,
>>>>>>>> >>>> Simon
>>>>>>>> >>>>
>>>>>>>> >>>>          [[alternative HTML version deleted]]
>>>>>>>> >>>>
>>>>>>>> >>>> _______________________________________________
>>>>>>>> >>>> R-sig-meta-analysis mailing list
>>>>>>>> >>>> R-sig-meta-analysis using r-project.org
>>>>>>>> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>>>>> >>>>
>>>>>>>> >       [[alternative HTML version deleted]]
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > R-sig-meta-analysis mailing list
>>>>>>>> > R-sig-meta-analysis using r-project.org
>>>>>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Dr. rer. nat. Gerta Rücker, Dipl.-Math.
>>>>>>>>
>>>>>>>> Institute of Medical Biometry and Statistics,
>>>>>>>> Faculty of Medicine and Medical Center - University of Freiburg
>>>>>>>>
>>>>>>>> Stefan-Meier-Str. 26, D-79104 Freiburg, Germany
>>>>>>>>
>>>>>>>> Phone:    +49/761/203-6673
>>>>>>>> Fax:      +49/761/203-6680
>>>>>>>> Mail:     ruecker using imbi.uni-freiburg.de
>>>>>>>> Homepage: https://www.uniklinik-freiburg.de/imbi.html
>>>>>>>>
>>>>>>>>

	[[alternative HTML version deleted]]



More information about the R-sig-meta-analysis mailing list