[R-sig-ME] Cluster-robust SEs & random effects -- seeking some clarification

J.D. Haltigan jh@|t|g@ @end|ng |rom gm@||@com
Sun Jul 31 01:43:04 CEST 2022


It just occurred to me on my walk that I think I am getting a bit lost in
some of the differences in nomenclature across scientific silos. In the
original model that they specified, which treated the 'pairID' variable as
a control variable for which they controlled for 'fixed effects' of
control/treatment villages (in their own language in the paper) using
cluster-robust SEs, I think this is indeed a 'random-intercepts only' model
in the language of Hamaker et al. They implement the 'absorb' command in
STATA which I believe aggregates across the pairIDs to generate an
'omnibus' F-test of sorts for the pairID variable (in the ANOVA
nomenclature). I say this as when I specify the pairID variable in the lmer
model I shared (or in a fixest model I conducted to replicate the original
Abalauck results in R), I get the estimates for all the pairs (i.e., there
is no way to aggregate across them--though I think formally the models are
the same if we are unconcerned about any one pairID [treatment/control
village pair].

So, in the lmer model I shared where I specify a specific random effects
term for the 'cluster' variable, I think this indeed is allowing for random
slopes across the clusters which implies the treatment effect may vary
across the clusters (and we might anticipate it will for various reasons I
can elaborate on). More generally: we are generalizing to *any* universe of
villages (say in the entire world) where the treatment intervention (masks)
may vary across villages. This is the crux of invoking the random effects
model (i.e., random slopes model).

I realize this is a mouthful, but I think the way these terms (e.g.,
random/fixed effects models etc.) are used across disciplines makes things
a bit confusing.

On Sat, Jul 30, 2022 at 5:25 PM J.D. Haltigan <jhaltiga using gmail.com> wrote:

> This is a very helpful walkthrough, James. My responses are italicized
> under yours to maintain thread readability. The key is Generalizability
> here and (as I also note in my last reply) the idea is to Generalize to a
> universe of "any villages or clusters." That is, the target population we
> are generalizing to is *any* random population.
> On Sat, Jul 30, 2022 at 3:01 PM James Pustejovsky <jepusto using gmail.com>
> wrote:
>> Hi J.D.,
>> A few comments/reactions inline below.
>> James
>> On Wed, Jul 27, 2022 at 5:37 PM J.D. Haltigan <jhaltiga using gmail.com> wrote:
>>> ...
>> In the original investigation, the authors did not invoke a random
>>> effects model (but did use the pairIDs to control for fixed effects as
>>> noted and with robust SEs). Thus, in the original investigation there was
>>> *no* specification of a random effects model for the 'cluster' variable. We
>>> know from some other work there were some biases in village mapping and
>>> other possible sources of between-cluster variation that might be
>>> anticipated to have influence--at the random intercepts level--so we are
>>> looking into how specifying 'cluster' as a random effect might change the
>>> fixed effects estimates for the treatment intervention effect. In the
>>> Hamaker et al. language, it is indeed a 'random intercepts' only model.
>> I don't follow how using a random intercepts model improves the
>> generalizability warrant here. The random intercepts model is essentially
>> just a re-weighted average of the pair-specific effects in the original
>> analysis, where the weights are optimally efficient if the model is
>> correctly specified. That last clause carries a lot of weight here--correct
>> specification means 1) treatment assignment is unrelated to the random
>> effects, 2) the treatment effect is constant across clusters, 3)
>> distributional assumptions are valid (i.e., homoskedasticity at each level
>> of the model).
>> If the effects are heterogeneous, then I would think that including
>> random slopes on the treatment indicator would provide a better basis for
>> generalization. But even then, the warrant is still pretty vague---what is
>> the hypothetical population of villages from which the observed villages
>> are sampled?
> *In the most basic model (without baseline controls) the model takes the
> form: myModel = lmer(posXsymp~treatment + pairID + (1 | union), data =
> myData). I believe--correct me if I am wrong--that this reflects a
> random-intercepts only model, but I may be mistaken. If I am, and this is
> allowing for random slopes on the treatment indicator, then I will need to
> rethink my statements.  *
>>> Given this, however, does it also make sense to include the cluster
>>> robust SEs for the fixed effects which would account for possible
>>> heterogeneity of treatment effects (i.e., slopes) across clusters?s
>>> If you're committed to the random intercepts model, then yes I think so
>> because using cluster robust SEs at least acknowledges the possibility of
>> heterogeneous treatment effects.
> *If the above model does allow for both random intercepts and slopes, then
> perhaps the use of cluster robust SEs is redundant in some sense since the
> random slopes would be modeling the heterogeneity in treatment effects?*
>>> Bottom line: in their original analyses, clusters are seen as
>>> interchangeable from a conceptual perspective (rather than drawn from a
>>> random universe of observations). When one scales up evidence to a universe
>>> of observations that are random (as they would be in the intended universe
>>> of inference in the real-world), then we are better positioned, I think, to
>>> adjudicate whether the mask intervention effect is 'practically
>>> significant' (in addition to whether the focal effect remains marginally
>>> significant from a frequentist perspective).
>> As noted above, this argument is a bit vague to me. If there's concern
>> about generalizability, then my first question would be: what is the target
>> population to which you are trying to generalize?
> *Essentially, the target population we are trying to generalize to is a
> random selection of villages. Any random selection of villages. In other
> words, villages should not be seen as interchangeable. We are interested in
> whether the effects generalize to any randomly selected village. *

	[[alternative HTML version deleted]]

More information about the R-sig-mixed-models mailing list