[R-sig-ME] Cluster-robust SEs & random effects -- seeking some clarification

Sun Jul 31 02:36:11 CEST 2022

   Yes.

On 2022-07-30 8:12 p.m., J.D. Haltigan wrote:
> Thanks, Ben. So in the model you remarked on, would that be a 
> 'random-intercepts only' model?
> 
> 
> On Sat, Jul 30, 2022 at 7:53 PM Ben Bolker <bbolker using gmail.com 
> <mailto:bbolker using gmail.com>> wrote:
> 
>     I haven't been following the whole thread that carefully, but I want to
>     emphasize that
> 
>         posXsymp~treatment + pairID + (1 | union)
> 
>     is *not*, by any definition I'm familiar with, a "random-slopes model";
>     that is, it only estimates a single population-level treatment
>     effect/doesn't allow the effect of treatment to vary across groups
>     defined by 'union'.  You would need a random-effect term of the form
>     (treatment | union).
> 
>         Reasons why you might *not* want to do this:
> 
>        * if treatment only varies across and not within levels of union
>     ("union is nested within treatment" according to some terminology),
>     then
>     this variation is unidentifiable
>        * maybe you have decided that you don't have enough data/want a more
>     parsimonious model.
> 
>         Schielzeth and Forstmeier, among many others (this is the example I
>     know of), have cautioned about the consequences of leaving out
>     random-slopes terms.
> 
>     Schielzeth, Holger, and Wolfgang Forstmeier. “Conclusions beyond
>     Support: Overconfident Estimates in Mixed Models.” Behavioral Ecology
>     20, no. 2 (March 1, 2009): 416–20.
>     https://doi.org/10.1093/beheco/arn145
>     <https://doi.org/10.1093/beheco/arn145>.
> 
> 
>     On 2022-07-30 7:43 p.m., J.D. Haltigan wrote:
>      > Addendum:
>      >
>      > It just occurred to me on my walk that I think I am getting a bit
>     lost in
>      > some of the differences in nomenclature across scientific silos.
>     In the
>      > original model that they specified, which treated the 'pairID'
>     variable as
>      > a control variable for which they controlled for 'fixed effects' of
>      > control/treatment villages (in their own language in the paper) using
>      > cluster-robust SEs, I think this is indeed a 'random-intercepts
>     only' model
>      > in the language of Hamaker et al. They implement the 'absorb'
>     command in
>      > STATA which I believe aggregates across the pairIDs to generate an
>      > 'omnibus' F-test of sorts for the pairID variable (in the ANOVA
>      > nomenclature). I say this as when I specify the pairID variable
>     in the lmer
>      > model I shared (or in a fixest model I conducted to replicate the
>     original
>      > Abalauck results in R), I get the estimates for all the pairs
>     (i.e., there
>      > is no way to aggregate across them--though I think formally the
>     models are
>      > the same if we are unconcerned about any one pairID
>     [treatment/control
>      > village pair].
>      >
>      > So, in the lmer model I shared where I specify a specific random
>     effects
>      > term for the 'cluster' variable, I think this indeed is allowing
>     for random
>      > slopes across the clusters which implies the treatment effect may
>     vary
>      > across the clusters (and we might anticipate it will for various
>     reasons I
>      > can elaborate on). More generally: we are generalizing to *any*
>     universe of
>      > villages (say in the entire world) where the treatment
>     intervention (masks)
>      > may vary across villages. This is the crux of invoking the random
>     effects
>      > model (i.e., random slopes model).
>      >
>      > I realize this is a mouthful, but I think the way these terms (e.g.,
>      > random/fixed effects models etc.) are used across disciplines
>     makes things
>      > a bit confusing.
>      >
>      > On Sat, Jul 30, 2022 at 5:25 PM J.D. Haltigan <jhaltiga using gmail.com
>     <mailto:jhaltiga using gmail.com>> wrote:
>      >
>      >> This is a very helpful walkthrough, James. My responses are
>     italicized
>      >> under yours to maintain thread readability. The key is
>     Generalizability
>      >> here and (as I also note in my last reply) the idea is to
>     Generalize to a
>      >> universe of "any villages or clusters." That is, the target
>     population we
>      >> are generalizing to is *any* random population.
>      >>
>      >> On Sat, Jul 30, 2022 at 3:01 PM James Pustejovsky
>     <jepusto using gmail.com <mailto:jepusto using gmail.com>>
>      >> wrote:
>      >>
>      >>> Hi J.D.,
>      >>> A few comments/reactions inline below.
>      >>> James
>      >>>
>      >>> On Wed, Jul 27, 2022 at 5:37 PM J.D. Haltigan
>     <jhaltiga using gmail.com <mailto:jhaltiga using gmail.com>> wrote:
>      >>>
>      >>>> ...
>      >>>>
>      >>> In the original investigation, the authors did not invoke a random
>      >>>> effects model (but did use the pairIDs to control for fixed
>     effects as
>      >>>> noted and with robust SEs). Thus, in the original
>     investigation there was
>      >>>> *no* specification of a random effects model for the 'cluster'
>     variable. We
>      >>>> know from some other work there were some biases in village
>     mapping and
>      >>>> other possible sources of between-cluster variation that might be
>      >>>> anticipated to have influence--at the random intercepts
>     level--so we are
>      >>>> looking into how specifying 'cluster' as a random effect might
>     change the
>      >>>> fixed effects estimates for the treatment intervention effect.
>     In the
>      >>>> Hamaker et al. language, it is indeed a 'random intercepts'
>     only model.
>      >>>>
>      >>>
>      >>> I don't follow how using a random intercepts model improves the
>      >>> generalizability warrant here. The random intercepts model is
>     essentially
>      >>> just a re-weighted average of the pair-specific effects in the
>     original
>      >>> analysis, where the weights are optimally efficient if the model is
>      >>> correctly specified. That last clause carries a lot of weight
>     here--correct
>      >>> specification means 1) treatment assignment is unrelated to the
>     random
>      >>> effects, 2) the treatment effect is constant across clusters, 3)
>      >>> distributional assumptions are valid (i.e., homoskedasticity at
>     each level
>      >>> of the model).
>      >>>
>      >>> If the effects are heterogeneous, then I would think that including
>      >>> random slopes on the treatment indicator would provide a better
>     basis for
>      >>> generalization. But even then, the warrant is still pretty
>     vague---what is
>      >>> the hypothetical population of villages from which the observed
>     villages
>      >>> are sampled?
>      >>>
>      >>
>      >> *In the most basic model (without baseline controls) the model
>     takes the
>      >> form: myModel = lmer(posXsymp~treatment + pairID + (1 | union),
>     data =
>      >> myData). I believe--correct me if I am wrong--that this reflects a
>      >> random-intercepts only model, but I may be mistaken. If I am,
>     and this is
>      >> allowing for random slopes on the treatment indicator, then I
>     will need to
>      >> rethink my statements.  *
>      >>
>      >>>
>      >>>
>      >>>> Given this, however, does it also make sense to include the
>     cluster
>      >>>> robust SEs for the fixed effects which would account for possible
>      >>>> heterogeneity of treatment effects (i.e., slopes) across
>     clusters?s
>      >>>>
>      >>>> If you're committed to the random intercepts model, then yes I
>     think so
>      >>> because using cluster robust SEs at least acknowledges the
>     possibility of
>      >>> heterogeneous treatment effects.
>      >>>
>      >>
>      >> *If the above model does allow for both random intercepts and
>     slopes, then
>      >> perhaps the use of cluster robust SEs is redundant in some sense
>     since the
>      >> random slopes would be modeling the heterogeneity in treatment
>     effects?*
>      >>
>      >>>
>      >>>
>      >>>
>      >>>> Bottom line: in their original analyses, clusters are seen as
>      >>>> interchangeable from a conceptual perspective (rather than
>     drawn from a
>      >>>> random universe of observations). When one scales up evidence
>     to a universe
>      >>>> of observations that are random (as they would be in the
>     intended universe
>      >>>> of inference in the real-world), then we are better
>     positioned, I think, to
>      >>>> adjudicate whether the mask intervention effect is 'practically
>      >>>> significant' (in addition to whether the focal effect remains
>     marginally
>      >>>> significant from a frequentist perspective).
>      >>>>
>      >>> As noted above, this argument is a bit vague to me. If there's
>     concern
>      >>> about generalizability, then my first question would be: what
>     is the target
>      >>> population to which you are trying to generalize?
>      >>>
>      >>
>      >> *Essentially, the target population we are trying to generalize
>     to is a
>      >> random selection of villages. Any random selection of villages.
>     In other
>      >> words, villages should not be seen as interchangeable. We are
>     interested in
>      >> whether the effects generalize to any randomly selected village. *
>      >>
>      >>>
>      >>>
>      >>
>      >
>      >       [[alternative HTML version deleted]]
>      >
>      > _______________________________________________
>      > R-sig-mixed-models using r-project.org
>     <mailto:R-sig-mixed-models using r-project.org> mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> 
>     -- 
>     Dr. Benjamin Bolker
>     Professor, Mathematics & Statistics and Biology, McMaster University
>     Director, School of Computational Science and Engineering
>     (Acting) Graduate chair, Mathematics & Statistics
> 
>     _______________________________________________
>     R-sig-mixed-models using r-project.org
>     <mailto:R-sig-mixed-models using r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> 

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics