[R-sig-ME] Cluster-robust SEs & random effects -- seeking some clarification

Thu Jul 28 01:11:52 CEST 2022

>
> Thanks for walking through this with me, James.
>
> I appreciate it. So: the goal of the current project I am involved in is
> to make some conceptual arguments in the context of Generalizability Theory
> about the Bangladesh mask RCT study (you may have heard of the study, here
> is the DOI link: https://www.science.org/doi/10.1126/science.abi9069).
>
> 600 villages (300/300)
>
> the 'pairID' variable is a control the authors used to control for fixed
> effects of treatment-control pair (so, we have 300 pairIDs in total). In
> the original investigation, the authors did not invoke a random effects
> model (but did use the pairIDs to control for fixed effects as noted and
> with robust SEs). Thus, in the original investigation there was *no*
> specification of a random effects model for the 'cluster' variable. We know
> from some other work there were some biases in village mapping and other
> possible sources of between-cluster variation that might be anticipated to
> have influence--at the random intercepts level--so we are looking into how
> specifying 'cluster' as a random effect might change the fixed effects
> estimates for the treatment intervention effect. In the Hamaker et al.
> language, it is indeed a 'random intercepts' only model. Given this,
> however, does it also make sense to include the cluster robust SEs for the
> fixed effects which would account for possible heterogeneity of treatment
> effects (i.e., slopes) across clusters?s
>
> Bottom line: in their original analyses, clusters are seen as
> interchangeable from a conceptual perspective (rather than drawn from a
> random universe of observations). When one scales up evidence to a universe
> of observations that are random (as they would be in the intended universe
> of inference in the real-world), then we are better positioned, I think, to
> adjudicate whether the mask intervention effect is 'practically
> significant' (in addition to whether the focal effect remains marginally
> significant from a frequentist perspective).
>
> Hope this is somewhat clarifying.
>
>
> On Wed, Jul 27, 2022 at 12:05 PM James Pustejovsky <jepusto using gmail.com>
> wrote:
>
>> Hi J.D.,
>>
>> Responses inline below.
>>
>> James
>>
>> On Tue, Jul 26, 2022 at 10:42 PM J.D. Haltigan <jhaltiga using gmail.com>
>> wrote:
>>
>>> Many thanks for this detailed and insightful exposition, James.
>>>
>>> A few follow-ups:
>>>
>>> I had previously tried cluster-robust SEs with both the robustlmm package
>>> and now yours, and it appears I don't have the memory needed given the
>>> size
>>> of the data as I receive the following error:
>>> #Error in .local(x, y, ...) :
>>> #Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c,
>>> line 89
>>> In Googling this error message, I see it is likely due to the
>>> computational
>>> demands of a sparse matrix estimation, but was wondering if there were
>>> any
>>> other aspects of this I could explore.
>>>
>>>
>> To figure out what's going on, we need to know the dimensions of your
>> model. How many coefficients (fixed effects) are in your regression
>> specification? How many clusters do you have? What is the range of cluster
>> sizes?
>>
>>
>>> In regards to #4: I am invoking random effects to see how sensitive a
>>> fixed
>>> effects model (with cluster robust SEs) is to formally estimating the
>>> random cluster effect (so between-cluster variance). In the fixed effects
>>> model, the investigators did include a factor variable (i.e., cluster
>>> dummies as you describe below) that is nested within cluster (so, a pair
>>> variable indicating treatment-control village), but my predilection is
>>> that
>>> despite this, there are other sources of between-cluster variance that
>>> will
>>> likely nullify the point estimates of the fixed effects (in this case, a
>>> mask intervention).
>>
>>
>> I'm not sure what you mean by "other sources of between-cluster variance
>> that will likely nullify the point estimates of the fixed effects." Can you
>> explain in a bit more detail?
>>
>>
>>> So, if I am formally modeling the random cluster
>>> component, what does adding cluster-robust SEs in this case provide in
>>> terms of inference--both for the fixed effects and for the random
>>> effects?
>>>
>>
>> Cluster-robust SEs are only relevant for inference on the regression
>> coefficients (fixed effects)--not for the random effects. For the fixed
>> effects, using cluster-robust SEs provides robustness to misspecification
>> of the random effects model, such as omission of a random slope that should
>> be there. In the context of your example, there might be heterogeneity of
>> intervention effects across pairs of villages. A random effects model that
>> only includes random intercepts for each village will not "catch" this sort
>> of treatment effect heterogeneity, and model-based SEs could be invalidated
>> by it. Cluster-robust SEs will account for that sort of heterogeneity (and
>> be larger than the model-based SEs as a consequence).
>>
>

	[[alternative HTML version deleted]]