[R-sig-ME] confidence interval question

Ben Bolker bbolker at gmail.com
Tue Mar 22 22:10:45 CET 2011

Sarah Elmendorf <scelmend at ...> writes:

>
> Dear R-sig mixed users,
> I am trying to plot the results of a linear mixed model, and would like
> to add confidence intervals to show how the response varies with time
> (fixed effect).  I am using only random intercepts (not slopes).  My
> model also includes a weight term to account for precision.
>
> I came across several ways to do this:
> 1.the recommendations on the GLMMfaq page
> 2.Solution based on Gelman and Hill's "simulate" function
> 3. MCMC-based confidence intervals in the package "languageR"
>
> I am not a statistician, but I thought these should all give me roughly
> the same answer.  For the non-weighted test case, they do.  When the
> model includes weights, the CIs using method (1) are MUCH narrower than
> using (2) or (3).  Why is this?  And can anyone advise on which method
> is preferable or how the interpretation should differ based on which
> method I use??
>
> Reproducible example follows.  Hopefully it will illuminate where I am
> stuck or someone can point to where I've gone wrong.
>
> Thanks so much for your help.
> Sarah

This is stumping me a bit, to be honest.

The languageR approach should be the gold standard here:
it is (a) using code built into lme4 (mcmcsamp) at its core,
which ought to know what to do with weights unless there is
a serious bug; (b) accounting for uncertainty in the random
effects parameters, which neither of the other methods does.
(In principle, you should be able to build a solution that
looks a lot like your sim- and vcov-based solutions that is
based on mcmcsamp() instead of going through all of the
languageR machinery ...)

I don't know exactly what the 'sim' function is doing,
except that it is obviously getting larger variances for the
parameters than simulating directly from vcov() does:
compare

var(coef(sim.fm1)\$fixef)

and

vcov(fm1)

...

I don't see why setting 'weights' or not makes a difference, unless
there's something deep in the guts of the 'sim' method for mer objects
that takes them into account ...

[removing > so that Gmane doesn't complain]

library(lme4)
library(ggplot2) # Plotting
library(MEMSS) # for Orthodont

#modified example slightly to remove subject specific slopes, use only
#the resulting example is structured more like my actual data

fm1 = lmer(
formula = distance ~ age + (1|Subject)
, data = Orthodont[Orthodont\$Sex=="Male",], weights=c(1:64)
)

#unweighted version below gives similar answers for all 3 methods of
making CIs
#fm1 = lmer(
#    formula = distance ~ age + (1|Subject)
#   , data = Orthodont[Orthodont\$Sex=="Male",]
#)

newdat <- expand.grid(
age=c(8,10,12,14)
, distance = 0
)
mm = model.matrix(terms(fm1),newdat)
newdat\$distance = mm %*% fixef(fm1)
pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm))
tvar1 <- pvar1+VarCorr(fm1)\$Subject
newdat <- data.frame(
newdat
, plo = newdat\$distance-2*sqrt(pvar1)
, phi = newdat\$distance+2*sqrt(pvar1)
, tlo = newdat\$distance-2*sqrt(tvar1)
, thi = newdat\$distance+2*sqrt(tvar1)
)

#using languageR

library (languageR)

mcmc = pvals.fnc(fm1, nsim=1000, withMCMC=TRUE)
mcmcvers<-plotLMER.fnc(fm1, mcmcMat=mcmc\$mcmc, withList=TRUE)

detach (package:languageR)
detach (package:zipfR)

#using Gelman and Hill sim function
library ("arm")

sim.fm1<-sim(fm1, n.sims=1000)
newdat.arm <- expand.grid(
age=c(8,10,12,14)
, distance = 0
)

#estimate mean distance per age for each simulation
mm.dat= model.matrix(terms(fm1),newdat.arm)

You can replace the loop here:

outputfull <- data.frame(y=c(apply(coef(sim.fm1)\$fixef,
1,function(v) mm.dat%*%v)),
age=newdat.arm\$age,sim=rep(1:1000,each=4))

#take the 0.025 and 0.975 quantiles of predictions over all simulations
outputfull<-as.data.frame (outputfull)
names (outputfull)<-c("distance", "age", "sim")
newdat.arm\$ci.lb<-rep(NA, nrow (newdat.arm))
newdat.arm\$ci.ub<-rep(NA, nrow (newdat.arm))
newdat.arm\$mean<-rep(NA, nrow (newdat.arm))
for (k in 1:nrow(newdat.arm))    {
age<-newdat.arm\$age[k]

newdat.arm\$ci.lb[k]<-quantile(outputfull\$distance[outputfull\$age==age],0.025)
newdat.arm\$mean[k]<-mean (outputfull\$distance[outputfull\$age==age])
newdat.arm\$ci.ub[k]<-quantile
(outputfull\$distance[outputfull\$age==age],0.975)
}

detach (package:arm)
detach (package:abind)
detach (package:car)
detach (package:nnet)
detach (package:survival)
detach (package:splines)

#possible to plot them all out and see the differences in CIs, but
easiest to just look at the intervals for age=8 as a point example where
there are clear differences
mcmcvers\$age\$lower[mcmcvers\$age\$x==8]
mcmcvers\$age\$upper[mcmcvers\$age\$x==8]

newdat.arm\$ci.lb[newdat.arm\$age==8]
newdat.arm\$ci.ub[newdat.arm\$age==8]

newdat\$plo[newdat\$age==8]
newdat\$phi[newdat\$age==8]