[R-sig-ME] Cluster-robust SEs & random effects -- seeking some clarification

J.D. Haltigan jh@|t|g@ @end|ng |rom gm@||@com
Tue Aug 16 23:47:00 CEST 2022


Thanks, James The McNeish & Kelley (2019) paper is one I was not aware of
despite my read of several other Kelley-authored articles.

Indeed, that paper provides a point of departure for a question on my work
on the Bangladesh RCT mask-intervention study mentioned earlier.

In short: they used cluster-affiliated dummy variables (read: the pairID
variable) in a fixed effect model. For their linear run with baseline
controls, their STATA code was:
reghdfe posXsymp treatment proper_mask_base prop_resp_ill_base_2,
absorb(pairID) vce(cluster union)

In translating this to a random-effects model using lmer, does it make
sense to include the pairID variable in the model *if* I treat the cluster
variable as its own random effect as:

lme4_1_B = lmer(posXsymp~treatment+proper_mask_base+prop_resp_ill_base_2 +
pairID + (1 | union), data = bdata.raw1)#lme4 package

I have mentioned previously that the lmer code above is a random-intercepts
only model. This is by design as there are mean-level differences in the
clusters to begin with on several background variables that are captured by
the random effects. I also am making a conceptual case that in order for
the mask study to have appropriate generalizability, one must assume or
treat clusters as *randomly* selected from a larger population of clusters.
Otherwise, any marginal effect of the mask-intervention (while perhaps more
accurately estimated in a fixed model), is not going to have the
generalizability to any population of human interactions. My focal question
nonetheless concerns how to treat the pairID variable in my translation of
their fixed effects model to a random effects model in lmer. If I include
the pairID variable as above, what does it reflect given that cluster is
treated as a random effect? I have a separate model where I eliminate the
pairID variable as:

lme4_1 = lmer(posXsymp~treatment+proper_mask_base+prop_resp_ill_base_2 + (1
| union), data = bdata.raw1)#lme4 package

*What is the substantive difference between these two models? *My sense is
that this gets at the separation of between/within effects and that the
pairID variable in their original STATA fixed effects model (a
cluster-affiliated variable in the language of McNeish & Kelley) is
analogous to the cluster variable itself BUT in their model, a) the
assumption is that clusters are interchangeable (not drawn from a random
population); and b) one can not estimate within-cluster/between cluster
effects using their parameterization (i.e., random effects--in my case
intercepts--for the clusters).

I realize this is a bit of a mouthful, but I was inspired to post after
reading the McNeish & Kelley and needed to get this out for my own thinking.

-JD

On Mon, Aug 15, 2022 at 10:00 PM James Pustejovsky <jepusto using gmail.com>
wrote:

>
>> When you note, 'if you trust the specification of your random effects
>> structure' can you elaborate on this? I imagine in the extreme, no random
>> effects structure will ever truly be perfect, so I guess it comes down to
>> some combination of theory, practicality, and model tractability?
>>
>
> Sure. Clearly, any model is a stylized and approximate representation of
> the true process. By "trust the specification" I just mean that you--and
> usually, also readers or potential critics--think that the random effects
> structure of the model is an adequate representation of the features of the
> data-generating process. In more colloquial terms, did you (the analyst) do
> a good job of developing the model?
>
> I think it's pretty helpful to think about this stuff in terms of
> convincing an audience. In practice, and given the current reporting
> conventions in social science disciplines, it's often pretty hard for
> readers/reviewers/critics to gauge whether an analyst has done a good job.
> In such contexts, cluster-robust SEs give some additional assurance (or
> insurance, the analogy in my previous message) that the inferences can be
> trusted even if the analyst didn't engage in a thorough, diligent
> model-building process.
>
> James
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list