[R-sig-ME] instrumenting for an Xj correlated with Uj
Malcolm Fairbrother
m.fairbrother at bristol.ac.uk
Thu Aug 30 18:40:47 CEST 2012
Dear list,
A few months ago I asked about a model where a level-2 covariate (Xj) is correlated with the level-2 random effects (Uj), such that the estimated coefficient on Xj is biased--see <https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q2/018116.html>.
A couple people noted that an instrumental variable approach could help recover a less-biased estimate for the coefficient on Xj. However, I can find very little documentation about the use of instrumental variables in a multilevel/mixed modelling context. Following what I've read about the "2SLS" estimator (used above all in economics), I've adapted the original code I posted--see below--to perform a makeshift two-stage procedure which indeed performs very well in recovering the coefficient on Xj.
However, from what I've read, the SEs in the second stage of a two-stage instrumental variables approach need adjusting… yet I can't follow how, and especially in a multilevel context. Can anyone suggest any helpful resources, and/or additions to the code below? Would recourse to resampling with "refit(simulate(mod))" be a usable approach (in an lme4 context)?
Another issue is that the estimated level-2 variance (for "group") is much too high (whereas it's somewhat too low in a naive model without using an instrument).
Any input would be much appreciated.
Cheers,
Malcolm
library(lme4); library(multicore)
N <- 40 # set number of groups
T <- 10 # set number of observations per group
dgp <- function(N, T) {
dat <- data.frame(group=1:N, Zj=rnorm(N), Uj=rnorm(N))[rep(1:N,each=T),]
dat$Xj <- dat$Zj + dat$Uj # Xj is correlated with Uj, but Zj is not
dat$y <- 1 + dat$Xj + dat$Uj + rnorm(N*T)
dat
}
fit <- function(dat=dgp(N=N, T=T)) {
bmod <- lmer(y ~ Xj + (1 | group), dat) # fit a naive (two-level) model to estimate the effect of Zj
ivmod <- lm(Xj ~ Zj, dat[!duplicated(dat$group),]) # 1st stage: regress Xj on the instrument Zj (requires just one level)
mod <- lmer(y ~ fitted(ivmod)[rep(1:N,each=T)] + (1 | group), dat) # 2nd stage: regress y on the fitted values from the 1st stage model
c(fixef(bmod), as.numeric(summary(bmod)@REmat[,3]), fixef(mod), as.numeric(summary(mod)@REmat[,3]))
}
res1 <- do.call("rbind", mclapply(1:100, function(yy) fit())) # run simulations
apply(res1, 2, function(x) c(mean(x),median(x))) # coefficient on Xj is much closer to 1, group variance is biased in opposite ways, residual variance is identical
apply(res1[,c(2,6)], 2, sd) # variance is higher for IV
More information about the R-sig-mixed-models
mailing list