[R-sig-ME] Cluster-robust SEs & random effects -- seeking some clarification

Wed Jul 27 05:41:36 CEST 2022

Many thanks for this detailed and insightful exposition, James.

A few follow-ups:

I had previously tried cluster-robust SEs with both the robustlmm package
and now yours, and it appears I don't have the memory needed given the size
of the data as I receive the following error:
#Error in .local(x, y, ...) :
#Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 89
In Googling this error message, I see it is likely due to the computational
demands of a sparse matrix estimation, but was wondering if there were any
other aspects of this I could explore.

In regards to #4: I am invoking random effects to see how sensitive a fixed
effects model (with cluster robust SEs) is to formally estimating the
random cluster effect (so between-cluster variance). In the fixed effects
model, the investigators did include a factor variable (i.e., cluster
dummies as you describe below) that is nested within cluster (so, a pair
variable indicating treatment-control village), but my predilection is that
despite this, there are other sources of between-cluster variance that will
likely nullify the point estimates of the fixed effects (in this case, a
mask intervention). So, if I am formally modeling the random cluster
component, what does adding cluster-robust SEs in this case provide in
terms of inference--both for the fixed effects and for the random effects?

Hoping this is clear and if not will try to clarify.

-JD

On Mon, Jul 25, 2022 at 11:18 PM James Pustejovsky <jepusto using gmail.com>
wrote:

> Hi J.D.,
>
> I expect you may find a variety of takes on your question. I'll offer my
> own, as someone who's interested in mixed effects models and cluster robust
> standard errors (http://jepusto.github.io/clubSandwich/). There's really
> two things going on here. First is the choice between using a regular
> linear regression model or using a random effects model. Second is how you
> conduct inference (hypothesis tests, confidence intervals, etc.) on the
> regression coefficients, where you can either use model-based methods or
> cluster-robust methods. Cross those choices and you get four logical
> possibilities:
> 1. Regular linear regression, model-based standard errors (i.e., classical
> OLS t-tests/F-tests/CIs). This is clearly not going to work because it
> doesn't account for dependence in the errors.
> 2. Regular linear regression, cluster-robust standard errors.
> Cluster-robust methods handle dependence in the errors, so you can trust
> the inferences. Regular linear regression may not provide the most
> efficient coefficient estimates if you've got dependent error terms and
> unequally sized clusters. But, on the plus side, you can easily incorporate
> sampling weights. And if you include fixed effects (cluster-specific
> dummies), then this lessens concerns about potential cluster-level
> confounders of the predictor(s) of interest.
> 3. Linear mixed model (aka random effects model), model-based standard
> errors (i.e., what lmer() spits out automatically). Using random effects
> can improve the efficiency of coefficient estimates when you've got
> clusters of varying size. But, you can run into trouble if there are
> cluster-level confounders of the predictor(s) of interest (which is why
> many economists spurn the random effects model). And, it can be tricky to
> incorporate sampling weights if those are relevant. Furthermore, using
> model-based standard errors for inference amounts to asserting that you
> have correctly specified the model in pretty much all respects, including
> any random slopes, correlation between level-1 errors, homoskedasticity of
> errors at each level of the model, etc. Violation of any of the modeling
> assumptions (omitted levels of dependence, omitted random slopes,
> heteroskedastic cluster-level variances) could throw off the inferences to
> some extent.
> 4. Linear mixed model, cluster-robust standard errors (using, e.g.,
> nlme::lme() or lme4::lmer() with the clubSandwich package linked above).
> Using a random effects model to obtain point estimates of the coefficients
> has the same benefits and drawbacks in terms of efficiency, potential
> confounding concerns, etc. But, you can still use cluster-robust methods
> for inference, so that your hypothesis tests and confidence intervals will
> be valid even if some aspect of the random effect structure is
> mis-specified. For example, cluster-robust methods will work even if you've
> omitted a random slope that should actually be there.
>
> So in practice, I would suggest choosing between regular linear regression
> or a linear mixed model based on considerations of efficiency and bias from
> potential confounding. Then, if using a linear mixed model, choose an
> inference approach based on how much you trust the assumptions you lay out
> regarding the random effects structure.
>
> James
>
> On Mon, Jul 25, 2022 at 9:38 PM J.D. Haltigan <jhaltiga using gmail.com> wrote:
>
>> Hi:
>>
>> I am seeking some pedagogical guidance around the conceptual
>> relationship--if any--between accounting for non-independence of error
>> residuals in cluster designs via cluster-robust SE approaches & formally
>> modeling cluster variation (say, villages) using a random effects
>> parameter. This would be in contrast to a purely fixed effects design,
>> where the random effects of the cluster-level variable is not modeled (but
>> cluster-robust SEs are used for the fixed effects).
>>
>> I realize what I am trying to articulate above may be unclear or garbled,
>> but what I am asking/trying to better understand is whether if modeling
>> the
>> random effects formally obviates the need to worry about/account for error
>> residuals within cluster (e.g., via cluster-robust SEs). Trying to
>> articulate my wondering in another way: does accounting for cluster
>> residual non-independence fully address the issue of heteroskedasticity
>> concerns IF random effects are not formally modeled.
>>
>> Thanks for any insights in advance.
>>
>> -JD
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>

	[[alternative HTML version deleted]]