[R-sig-ME] confidence interval question

Tue Mar 22 22:10:45 CET 2011

Sarah Elmendorf <scelmend at ...> writes:

> 
> Dear R-sig mixed users,
> I am trying to plot the results of a linear mixed model, and would like 
> to add confidence intervals to show how the response varies with time 
> (fixed effect).  I am using only random intercepts (not slopes).  My 
> model also includes a weight term to account for precision.
> 
> I came across several ways to do this:
> 1.the recommendations on the GLMMfaq page
> 2.Solution based on Gelman and Hill's "simulate" function
> 3. MCMC-based confidence intervals in the package "languageR"
> 
> I am not a statistician, but I thought these should all give me roughly 
> the same answer.  For the non-weighted test case, they do.  When the 
> model includes weights, the CIs using method (1) are MUCH narrower than 
> using (2) or (3).  Why is this?  And can anyone advise on which method 
> is preferable or how the interpretation should differ based on which 
> method I use??
> 
> Reproducible example follows.  Hopefully it will illuminate where I am 
> stuck or someone can point to where I've gone wrong.
> 
> Thanks so much for your help.
> Sarah

  This is stumping me a bit, to be honest.

  The languageR approach should be the gold standard here:
it is (a) using code built into lme4 (mcmcsamp) at its core,
which ought to know what to do with weights unless there is
a serious bug; (b) accounting for uncertainty in the random
effects parameters, which neither of the other methods does.
(In principle, you should be able to build a solution that
looks a lot like your sim- and vcov-based solutions that is
based on mcmcsamp() instead of going through all of the
languageR machinery ...)

  I don't know exactly what the 'sim' function is doing,
except that it is obviously getting larger variances for the
parameters than simulating directly from vcov() does:
compare

var(coef(sim.fm1)$fixef)

and

vcov(fm1) 

...

  I don't see why setting 'weights' or not makes a difference, unless
there's something deep in the guts of the 'sim' method for mer objects
that takes them into account ...

[removing > so that Gmane doesn't complain] 

 library(lme4)
 library(ggplot2) # Plotting
 library(MEMSS) # for Orthodont

 #modified example slightly to remove subject specific slopes, use only 
 males and added weights
 #the resulting example is structured more like my actual data

 fm1 = lmer(
      formula = distance ~ age + (1|Subject)
      , data = Orthodont[Orthodont$Sex=="Male",], weights=c(1:64)
 )

 #unweighted version below gives similar answers for all 3 methods of 
 making CIs
 #fm1 = lmer(
 #    formula = distance ~ age + (1|Subject)
 #   , data = Orthodont[Orthodont$Sex=="Male",]
 #)

 newdat <- expand.grid(
      age=c(8,10,12,14)
      , distance = 0
 )
 mm = model.matrix(terms(fm1),newdat)
 newdat$distance = mm %*% fixef(fm1)
 pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm))
 tvar1 <- pvar1+VarCorr(fm1)$Subject[1]
 newdat <- data.frame(
      newdat
      , plo = newdat$distance-2*sqrt(pvar1)
      , phi = newdat$distance+2*sqrt(pvar1)
      , tlo = newdat$distance-2*sqrt(tvar1)
      , thi = newdat$distance+2*sqrt(tvar1)
 )

 #using languageR

 library (languageR)

 mcmc = pvals.fnc(fm1, nsim=1000, withMCMC=TRUE)
 mcmcvers<-plotLMER.fnc(fm1, mcmcMat=mcmc$mcmc, withList=TRUE)

 detach (package:languageR)
 detach (package:zipfR)

 #using Gelman and Hill sim function
 library ("arm")

 sim.fm1<-sim(fm1, n.sims=1000)
 newdat.arm <- expand.grid(
      age=c(8,10,12,14)
      , distance = 0
 )

 #estimate mean distance per age for each simulation
 mm.dat= model.matrix(terms(fm1),newdat.arm)

You can replace the loop here:

outputfull <- data.frame(y=c(apply(coef(sim.fm1)$fixef,
	   1,function(v) mm.dat%*%v)),
           		 age=newdat.arm$age,sim=rep(1:1000,each=4))

 #take the 0.025 and 0.975 quantiles of predictions over all simulations
 outputfull<-as.data.frame (outputfull)
 names (outputfull)<-c("distance", "age", "sim")
 newdat.arm$ci.lb<-rep(NA, nrow (newdat.arm))
 newdat.arm$ci.ub<-rep(NA, nrow (newdat.arm))
 newdat.arm$mean<-rep(NA, nrow (newdat.arm))
 for (k in 1:nrow(newdat.arm))    {
      age<-newdat.arm$age[k]

 newdat.arm$ci.lb[k]<-quantile(outputfull$distance[outputfull$age==age],0.025)
      newdat.arm$mean[k]<-mean (outputfull$distance[outputfull$age==age])
      newdat.arm$ci.ub[k]<-quantile 
 (outputfull$distance[outputfull$age==age],0.975)
      }

 detach (package:arm)
 detach (package:abind)
 detach (package:car)
 detach (package:nnet)
 detach (package:survival)
 detach (package:splines)

 #possible to plot them all out and see the differences in CIs, but 
 easiest to just look at the intervals for age=8 as a point example where 
 there are clear differences
 mcmcvers$age$lower[mcmcvers$age$x==8]
 mcmcvers$age$upper[mcmcvers$age$x==8]

 newdat.arm$ci.lb[newdat.arm$age==8]
 newdat.arm$ci.ub[newdat.arm$age==8]

 newdat$plo[newdat$age==8]
 newdat$phi[newdat$age==8]