[R-sig-ME] Cluster-robust SEs & random effects -- seeking some clarification

James Pustejovsky jepu@to @end|ng |rom gm@||@com
Tue Jul 26 05:18:13 CEST 2022


Hi J.D.,

I expect you may find a variety of takes on your question. I'll offer my
own, as someone who's interested in mixed effects models and cluster robust
standard errors (http://jepusto.github.io/clubSandwich/). There's really
two things going on here. First is the choice between using a regular
linear regression model or using a random effects model. Second is how you
conduct inference (hypothesis tests, confidence intervals, etc.) on the
regression coefficients, where you can either use model-based methods or
cluster-robust methods. Cross those choices and you get four logical
possibilities:
1. Regular linear regression, model-based standard errors (i.e., classical
OLS t-tests/F-tests/CIs). This is clearly not going to work because it
doesn't account for dependence in the errors.
2. Regular linear regression, cluster-robust standard errors.
Cluster-robust methods handle dependence in the errors, so you can trust
the inferences. Regular linear regression may not provide the most
efficient coefficient estimates if you've got dependent error terms and
unequally sized clusters. But, on the plus side, you can easily incorporate
sampling weights. And if you include fixed effects (cluster-specific
dummies), then this lessens concerns about potential cluster-level
confounders of the predictor(s) of interest.
3. Linear mixed model (aka random effects model), model-based standard
errors (i.e., what lmer() spits out automatically). Using random effects
can improve the efficiency of coefficient estimates when you've got
clusters of varying size. But, you can run into trouble if there are
cluster-level confounders of the predictor(s) of interest (which is why
many economists spurn the random effects model). And, it can be tricky to
incorporate sampling weights if those are relevant. Furthermore, using
model-based standard errors for inference amounts to asserting that you
have correctly specified the model in pretty much all respects, including
any random slopes, correlation between level-1 errors, homoskedasticity of
errors at each level of the model, etc. Violation of any of the modeling
assumptions (omitted levels of dependence, omitted random slopes,
heteroskedastic cluster-level variances) could throw off the inferences to
some extent.
4. Linear mixed model, cluster-robust standard errors (using, e.g.,
nlme::lme() or lme4::lmer() with the clubSandwich package linked above).
Using a random effects model to obtain point estimates of the coefficients
has the same benefits and drawbacks in terms of efficiency, potential
confounding concerns, etc. But, you can still use cluster-robust methods
for inference, so that your hypothesis tests and confidence intervals will
be valid even if some aspect of the random effect structure is
mis-specified. For example, cluster-robust methods will work even if you've
omitted a random slope that should actually be there.

So in practice, I would suggest choosing between regular linear regression
or a linear mixed model based on considerations of efficiency and bias from
potential confounding. Then, if using a linear mixed model, choose an
inference approach based on how much you trust the assumptions you lay out
regarding the random effects structure.

James

On Mon, Jul 25, 2022 at 9:38 PM J.D. Haltigan <jhaltiga using gmail.com> wrote:

> Hi:
>
> I am seeking some pedagogical guidance around the conceptual
> relationship--if any--between accounting for non-independence of error
> residuals in cluster designs via cluster-robust SE approaches & formally
> modeling cluster variation (say, villages) using a random effects
> parameter. This would be in contrast to a purely fixed effects design,
> where the random effects of the cluster-level variable is not modeled (but
> cluster-robust SEs are used for the fixed effects).
>
> I realize what I am trying to articulate above may be unclear or garbled,
> but what I am asking/trying to better understand is whether if modeling the
> random effects formally obviates the need to worry about/account for error
> residuals within cluster (e.g., via cluster-robust SEs). Trying to
> articulate my wondering in another way: does accounting for cluster
> residual non-independence fully address the issue of heteroskedasticity
> concerns IF random effects are not formally modeled.
>
> Thanks for any insights in advance.
>
> -JD
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list