[R-sig-ME] Cluster-robust SEs & random effects -- seeking some clarification
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Sun Jul 31 02:36:11 CEST 2022
Yes.
On 2022-07-30 8:12 p.m., J.D. Haltigan wrote:
> Thanks, Ben. So in the model you remarked on, would that be a
> 'random-intercepts only' model?
>
>
> On Sat, Jul 30, 2022 at 7:53 PM Ben Bolker <bbolker using gmail.com
> <mailto:bbolker using gmail.com>> wrote:
>
> I haven't been following the whole thread that carefully, but I want to
> emphasize that
>
> posXsymp~treatment + pairID + (1 | union)
>
> is *not*, by any definition I'm familiar with, a "random-slopes model";
> that is, it only estimates a single population-level treatment
> effect/doesn't allow the effect of treatment to vary across groups
> defined by 'union'. You would need a random-effect term of the form
> (treatment | union).
>
> Reasons why you might *not* want to do this:
>
> * if treatment only varies across and not within levels of union
> ("union is nested within treatment" according to some terminology),
> then
> this variation is unidentifiable
> * maybe you have decided that you don't have enough data/want a more
> parsimonious model.
>
> Schielzeth and Forstmeier, among many others (this is the example I
> know of), have cautioned about the consequences of leaving out
> random-slopes terms.
>
> Schielzeth, Holger, and Wolfgang Forstmeier. “Conclusions beyond
> Support: Overconfident Estimates in Mixed Models.” Behavioral Ecology
> 20, no. 2 (March 1, 2009): 416–20.
> https://doi.org/10.1093/beheco/arn145
> <https://doi.org/10.1093/beheco/arn145>.
>
>
> On 2022-07-30 7:43 p.m., J.D. Haltigan wrote:
> > Addendum:
> >
> > It just occurred to me on my walk that I think I am getting a bit
> lost in
> > some of the differences in nomenclature across scientific silos.
> In the
> > original model that they specified, which treated the 'pairID'
> variable as
> > a control variable for which they controlled for 'fixed effects' of
> > control/treatment villages (in their own language in the paper) using
> > cluster-robust SEs, I think this is indeed a 'random-intercepts
> only' model
> > in the language of Hamaker et al. They implement the 'absorb'
> command in
> > STATA which I believe aggregates across the pairIDs to generate an
> > 'omnibus' F-test of sorts for the pairID variable (in the ANOVA
> > nomenclature). I say this as when I specify the pairID variable
> in the lmer
> > model I shared (or in a fixest model I conducted to replicate the
> original
> > Abalauck results in R), I get the estimates for all the pairs
> (i.e., there
> > is no way to aggregate across them--though I think formally the
> models are
> > the same if we are unconcerned about any one pairID
> [treatment/control
> > village pair].
> >
> > So, in the lmer model I shared where I specify a specific random
> effects
> > term for the 'cluster' variable, I think this indeed is allowing
> for random
> > slopes across the clusters which implies the treatment effect may
> vary
> > across the clusters (and we might anticipate it will for various
> reasons I
> > can elaborate on). More generally: we are generalizing to *any*
> universe of
> > villages (say in the entire world) where the treatment
> intervention (masks)
> > may vary across villages. This is the crux of invoking the random
> effects
> > model (i.e., random slopes model).
> >
> > I realize this is a mouthful, but I think the way these terms (e.g.,
> > random/fixed effects models etc.) are used across disciplines
> makes things
> > a bit confusing.
> >
> > On Sat, Jul 30, 2022 at 5:25 PM J.D. Haltigan <jhaltiga using gmail.com
> <mailto:jhaltiga using gmail.com>> wrote:
> >
> >> This is a very helpful walkthrough, James. My responses are
> italicized
> >> under yours to maintain thread readability. The key is
> Generalizability
> >> here and (as I also note in my last reply) the idea is to
> Generalize to a
> >> universe of "any villages or clusters." That is, the target
> population we
> >> are generalizing to is *any* random population.
> >>
> >> On Sat, Jul 30, 2022 at 3:01 PM James Pustejovsky
> <jepusto using gmail.com <mailto:jepusto using gmail.com>>
> >> wrote:
> >>
> >>> Hi J.D.,
> >>> A few comments/reactions inline below.
> >>> James
> >>>
> >>> On Wed, Jul 27, 2022 at 5:37 PM J.D. Haltigan
> <jhaltiga using gmail.com <mailto:jhaltiga using gmail.com>> wrote:
> >>>
> >>>> ...
> >>>>
> >>> In the original investigation, the authors did not invoke a random
> >>>> effects model (but did use the pairIDs to control for fixed
> effects as
> >>>> noted and with robust SEs). Thus, in the original
> investigation there was
> >>>> *no* specification of a random effects model for the 'cluster'
> variable. We
> >>>> know from some other work there were some biases in village
> mapping and
> >>>> other possible sources of between-cluster variation that might be
> >>>> anticipated to have influence--at the random intercepts
> level--so we are
> >>>> looking into how specifying 'cluster' as a random effect might
> change the
> >>>> fixed effects estimates for the treatment intervention effect.
> In the
> >>>> Hamaker et al. language, it is indeed a 'random intercepts'
> only model.
> >>>>
> >>>
> >>> I don't follow how using a random intercepts model improves the
> >>> generalizability warrant here. The random intercepts model is
> essentially
> >>> just a re-weighted average of the pair-specific effects in the
> original
> >>> analysis, where the weights are optimally efficient if the model is
> >>> correctly specified. That last clause carries a lot of weight
> here--correct
> >>> specification means 1) treatment assignment is unrelated to the
> random
> >>> effects, 2) the treatment effect is constant across clusters, 3)
> >>> distributional assumptions are valid (i.e., homoskedasticity at
> each level
> >>> of the model).
> >>>
> >>> If the effects are heterogeneous, then I would think that including
> >>> random slopes on the treatment indicator would provide a better
> basis for
> >>> generalization. But even then, the warrant is still pretty
> vague---what is
> >>> the hypothetical population of villages from which the observed
> villages
> >>> are sampled?
> >>>
> >>
> >> *In the most basic model (without baseline controls) the model
> takes the
> >> form: myModel = lmer(posXsymp~treatment + pairID + (1 | union),
> data =
> >> myData). I believe--correct me if I am wrong--that this reflects a
> >> random-intercepts only model, but I may be mistaken. If I am,
> and this is
> >> allowing for random slopes on the treatment indicator, then I
> will need to
> >> rethink my statements. *
> >>
> >>>
> >>>
> >>>> Given this, however, does it also make sense to include the
> cluster
> >>>> robust SEs for the fixed effects which would account for possible
> >>>> heterogeneity of treatment effects (i.e., slopes) across
> clusters?s
> >>>>
> >>>> If you're committed to the random intercepts model, then yes I
> think so
> >>> because using cluster robust SEs at least acknowledges the
> possibility of
> >>> heterogeneous treatment effects.
> >>>
> >>
> >> *If the above model does allow for both random intercepts and
> slopes, then
> >> perhaps the use of cluster robust SEs is redundant in some sense
> since the
> >> random slopes would be modeling the heterogeneity in treatment
> effects?*
> >>
> >>>
> >>>
> >>>
> >>>> Bottom line: in their original analyses, clusters are seen as
> >>>> interchangeable from a conceptual perspective (rather than
> drawn from a
> >>>> random universe of observations). When one scales up evidence
> to a universe
> >>>> of observations that are random (as they would be in the
> intended universe
> >>>> of inference in the real-world), then we are better
> positioned, I think, to
> >>>> adjudicate whether the mask intervention effect is 'practically
> >>>> significant' (in addition to whether the focal effect remains
> marginally
> >>>> significant from a frequentist perspective).
> >>>>
> >>> As noted above, this argument is a bit vague to me. If there's
> concern
> >>> about generalizability, then my first question would be: what
> is the target
> >>> population to which you are trying to generalize?
> >>>
> >>
> >> *Essentially, the target population we are trying to generalize
> to is a
> >> random selection of villages. Any random selection of villages.
> In other
> >> words, villages should not be seen as interchangeable. We are
> interested in
> >> whether the effects generalize to any randomly selected village. *
> >>
> >>>
> >>>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org
> <mailto:R-sig-mixed-models using r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
>
> _______________________________________________
> R-sig-mixed-models using r-project.org
> <mailto:R-sig-mixed-models using r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>
--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
More information about the R-sig-mixed-models
mailing list