[R-sig-ME] instrumenting for an Xj correlated with Uj

Thu Aug 30 18:40:47 CEST 2012

Dear list,

A few months ago I asked about a model where a level-2 covariate (Xj) is correlated with the level-2 random effects (Uj), such that the estimated coefficient on Xj is biased--see <https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q2/018116.html>.

A couple people noted that an instrumental variable approach could help recover a less-biased estimate for the coefficient on Xj. However, I can find very little documentation about the use of instrumental variables in a multilevel/mixed modelling context. Following what I've read about the "2SLS" estimator (used above all in economics), I've adapted the original code I posted--see below--to perform a makeshift two-stage procedure which indeed performs very well in recovering the coefficient on Xj.

However, from what I've read, the SEs in the second stage of a two-stage instrumental variables approach need adjusting… yet I can't follow how, and especially in a multilevel context. Can anyone suggest any helpful resources, and/or additions to the code below? Would recourse to resampling with "refit(simulate(mod))" be a usable approach (in an lme4 context)?

Another issue is that the estimated level-2 variance (for "group") is much too high (whereas it's somewhat too low in a naive model without using an instrument).

Any input would be much appreciated.

Cheers,
Malcolm

library(lme4); library(multicore)
N <- 40 # set number of groups
T <- 10 # set number of observations per group
dgp <- function(N, T) {
	dat <- data.frame(group=1:N, Zj=rnorm(N), Uj=rnorm(N))[rep(1:N,each=T),]
	dat$Xj <- dat$Zj + dat$Uj	# Xj is correlated with Uj, but Zj is not
	dat$y <- 1 + dat$Xj + dat$Uj + rnorm(N*T)
	dat
	}
fit <- function(dat=dgp(N=N, T=T)) {
	bmod <- lmer(y ~ Xj + (1 | group), dat) # fit a naive (two-level) model to estimate the effect of Zj
	ivmod <- lm(Xj ~ Zj, dat[!duplicated(dat$group),]) # 1st stage: regress Xj on the instrument Zj (requires just one level)
	mod <- lmer(y ~ fitted(ivmod)[rep(1:N,each=T)] + (1 | group), dat) # 2nd stage: regress y on the fitted values from the 1st stage model
	c(fixef(bmod), as.numeric(summary(bmod)@REmat[,3]), fixef(mod), as.numeric(summary(mod)@REmat[,3]))
	}
res1 <- do.call("rbind", mclapply(1:100, function(yy) fit())) # run simulations
apply(res1, 2, function(x) c(mean(x),median(x))) # coefficient on Xj is much closer to 1, group variance is biased in opposite ways, residual variance is identical
apply(res1[,c(2,6)], 2, sd) # variance is higher for IV