[R-sig-ME] Controlling for self-selection bias / endogeneity in mixed models

Mon Apr 13 14:54:00 CEST 2020

Just a short note:
Just read your previous answer to John, where you referred to my GitHub page
dealing with the REWB model. The vignette in the parameters package is
actually just an updated version of that GitHub page.

> I scaled "experience" originally to address convergence issues, but as the
R implementation of scaling also centers, I also addressed the collinearity
between the main and interaction variables. But I can center without scaling
of course.

This is something I'm not 100% sure about to deal with interaction effects
with time-varying predictors, so my comment was rather a pointer than a
definite suggestion. In a recent paper, we are modelling the interaction
between group (or treatment) and the between-effect only, and the
within-predictor is used as further covariate. I just read one or two papers
where different ways of group- and de-meaning were suggested.

> Your first link did not work for me

It's actually the same link as in the first paragraph of the vignette I
posted.

Best
Daniel

-----Ursprüngliche Nachricht-----
Von: Slaughter, Kelly <KELLY.SLAUGHTER using tcu.edu> 
Gesendet: Montag, 13. April 2020 13:53
An: Daniel Lüdecke <d.luedecke using uke.de>; r-sig-mixed-models using r-project.org
Betreff: RE: [R-sig-ME] Controlling for self-selection bias / endogeneity in
mixed models

Thank you, Daniel. 

Yes, I have the time invariant "treatment" as a level 1/fixed effect, and am
further hypothesizing that "treatment" is more important as one gains
"experience" (thus an interaction variable). The variable I am considering
de-meaning / group meaning is "experience".

I scaled "experience" originally to address convergence issues, but as the R
implementation of scaling also centers, I also addressed the collinearity
between the main and interaction variables. But I can center without scaling
of course. 

Your first link did not work for me, but the general site referenced in the
link as well as the "parameters" links look potentially quite helpful. I
will review/run your gist to better understand the impact of within/between
and REWB, thank you very much!

-----Original Message-----
From: Daniel Lüdecke <d.luedecke using uke.de> 
Sent: Monday, April 13, 2020 2:23 AM
To: Slaughter, Kelly <KELLY.SLAUGHTER using tcu.edu>;
r-sig-mixed-models using r-project.org
Subject: AW: [R-sig-ME] Controlling for self-selection bias / endogeneity in
mixed models

Hi Kelly,

> Not an issue for me - I am not concerned with level 2, I include 
> subject
to address the IID violation but am interested in population, not subject,
performance.

If your variable is practically time constant (or time invariant), you can
add it as normal predictor, and you don't need the de-mean and group-mean of
it (separation into within- and between-effects). In your case, if
"treatment" is practically constant over time, you just include it "as is"
in your model.

The main reason for heterogeneity bias, if I understood Bell et al.
correctly, is the weighted average of coefficients for time-varying
variables (or more general: level-1 predictors that have also a level-2
effect and thus might correlate with the group variable from the random
effects). Simply decomposing time-varying predictors into their within- and
between-effects indeed give you the same consistent estimates as a "fixed
effects" model, just that the REWB model has much more benefits.

Based on a short blog post I found
(https://urldefense.proofpoint.com/v2/url?u=https-3A__shouldbewriting.netlif
y.com_posts_2019-2D10-2D21-2Daccounting-2Dfor-2Dwithin-2D&d=DwIFAw&c=7Q-FWLB
TAxn3T_E3HWrzGYJrC4RvUoWDrzTlitGRH_A&r=t-hV_EQcvMxUUCFqXmGPFL3N6XmAH6-xWI5Xp
n-HlYI&m=VAEa7Lfqyy-8PDDuGoMNHrwE_K1t6S14Mdc5eCWNdCI&s=asNSu6KdGLAPx10e6Zrk1
NQaAnQzyJUr9SWWrihKz9Q&e=
and-between-subject-effect/) I have written a small gist that produces plots
and coefficient tables for teaching repeated measurement with mixed models,
which shows this:
https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_strenge
jacke_c53e1fa1d7cf41e4737f3ab044a67d09&d=DwIFAw&c=7Q-FWLBTAxn3T_E3HWrzGYJrC4
RvUoWDrzTlitGRH_A&r=t-hV_EQcvMxUUCFqXmGPFL3N6XmAH6-xWI5Xpn-HlYI&m=VAEa7Lfqyy
-8PDDuGoMNHrwE_K1t6S14Mdc5eCWNdCI&s=Tn8PpuouPfun3hSncpr4T3MUMZq_Fg8iFBKc9NcD
yxA&e= 

One thing I would take into consideration is the interaction term. There are
several ways how to do this if a time-varying predictor is used in an
interaction, I would not scale it (as in your example), but probably think
if you're interested in the interaction of the within- or between-effect (or
both). See the 'Details' in the "parameters::demean()" help for some more
explanation and references
(https://urldefense.proofpoint.com/v2/url?u=https-3A__easystats.github.io_pa
rameters_reference_demean.html&d=DwIFAw&c=7Q-FWLBTAxn3T_E3HWrzGYJrC4RvUoWDrz
TlitGRH_A&r=t-hV_EQcvMxUUCFqXmGPFL3N6XmAH6-xWI5Xpn-HlYI&m=VAEa7Lfqyy-8PDDuGo
MNHrwE_K1t6S14Mdc5eCWNdCI&s=7U7xH0JS_kBGMIorBDahpL-x84Gae4UmyQ6TYi0HG_0&e=
).

To your 2nd question: See my gist above. FE models model the within-effect,
however, this may (or: is very likely to) vary between group levels (i.e.
subjects). Thus, including the within-effect as random slope makes sense,
since it captures the variability between groups (but leads to increased SE
because it accounts better for the uncertainty in the random effects). See
also this vignette:
https://urldefense.proofpoint.com/v2/url?u=https-3A__easystats.github.io_par
ameters_articles_demean.html&d=DwIFAw&c=7Q-FWLBTAxn3T_E3HWrzGYJrC4RvUoWDrzTl
itGRH_A&r=t-hV_EQcvMxUUCFqXmGPFL3N6XmAH6-xWI5Xpn-HlYI&m=VAEa7Lfqyy-8PDDuGoMN
HrwE_K1t6S14Mdc5eCWNdCI&s=U7045TSlmK52uGBY4fzPGhwaCLUpD-1QZdz9fyYmF8U&e= 

Best
Daniel

-----Ursprüngliche Nachricht-----
Von: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> Im
Auftrag von Slaughter, Kelly
Gesendet: Montag, 13. April 2020 01:34
An: r-sig-mixed-models using r-project.org
Betreff: [R-sig-ME] Controlling for self-selection bias / endogeneity in
mixed models

Hi all -

I have a concern regarding self-selection/omitted variable bias. I have a
longitudinal/repeated measures model, theorizing about a relationship
between treatment/control and effort, represented in nlme syntax as:

EQ 1) log(effort measured in time) ~ treatment*scale(experience), random =
~1|subject

Treatment/control is selected by the subject, it is not randomized, thus
raising endogeneity concerns. My background is applied econ, so as I learn
the mixed model domain, I expected to find the mixed model equivalent of
instrumental variables/inverse Mills ratio, etc. Yet there is surprisingly
(to me) limited material addressing this issue. The best reference material
I found is in fact a thread in this mailing list from October 2016 and the
papers referenced within, leading to Bell, Fairbrother, and Jones (2019). My
first impression is that I should employ a within-between random effects
(REWB)model -

EQ 2) log(effort measured in time) ~ treatment*scale(experience) +
experience_between + experience_within, random = experience_within +
scale(experience) | subject

If I understand correctly, the intuition is that the addition of a group
mean explanatory variable "breaks out" the variability that would be
associated with an omitted variable / error term. Per Bell et al, "there can
be no correlation between level 1 variables included in the model and the
level 2 random effects...unchanging and/or unmeasured characteristics of an
individual (such as intelligence, ability, etc.) will be controlled out of
the estimate of the within effect."

So, no concern between the subject (level 2) and treatment (level 1) via
REWB, wonderful!

Bell et al caution, "...in a REWB/Mundlak models, unmeasured level 2
characteristics can cause bias in the estimates of between effects and
effects of other level 2 variables."

Not an issue for me - I am not concerned with level 2, I include subject to
address the IID violation but am interested in population, not subject,
performance.

Bell et al continue, "However, unobserved time-varying characteristics can
still cause biases at level 1 in either an FE or a REWB/Mundlak model."

Though conceptually my treatment variable is time-varying (it can change
across time within a subject), as a practical/empirical matter, the
treatment is unchanging within the subject - subjects have no reason to
change / would prefer to keep the choice constant. Of 80k records, treatment
switches within a subject occur in about a dozen records.

So, I think I have my solution. However, if a reviewer is not happy with the
with-in / between REWB solution (worried about the level 1 bias), I can
further defend EQ 2 via its random coefficient/slope, if I understand the
Oct 2016 thread correctly.

So, my questions are:

(1) Is the above correctly reasoned?

(2) If the random slope model is a further defense against self-selection
bias, could someone provide an intuitive explanation as to why? Is the idea
that by allowing slopes to vary, there is no endogeneity problem to solve as
the very structure of the model makes the correlated errors concern
irrelevant?

Other solutions I explore include a Mundlak model, but per Bell et al, the
Mundlak models are not meaningful for repeated measures. Also, it appears
that the brms package appears to support mixed modeling using instrumental
variables, something I am more comfortable with per my background, but
strong instrumental variables are hard to find in the wild!

Thank you! - Kelly

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_li
stinfo_r-2Dsig-2Dmixed-2Dmodels&d=DwIFAw&c=7Q-FWLBTAxn3T_E3HWrzGYJrC4RvUoWDr
zTlitGRH_A&r=t-hV_EQcvMxUUCFqXmGPFL3N6XmAH6-xWI5Xpn-HlYI&m=VAEa7Lfqyy-8PDDuG
oMNHrwE_K1t6S14Mdc5eCWNdCI&s=tuEd99m5bw5OUB0RX6CZfHDZ5w2nTVzXy4d1wozIRRk&e= 

--

_____________________________________________________________________

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen
Rechts; Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Joachim Prölß,
Prof. Dr. Blanche Schwappach-Pignataro, Marya Verdel
_____________________________________________________________________

SAVE PAPER - THINK BEFORE PRINTING

--

_____________________________________________________________________

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Joachim Prölß, Prof. Dr. Blanche Schwappach-Pignataro, Marya Verdel
_____________________________________________________________________

SAVE PAPER - THINK BEFORE PRINTING