[R-sig-ME] lmer: parametric bootstrap versus mcmcsamp

Gustaf Granath gustaf.granath at gmail.com
Wed Sep 4 07:15:53 CEST 2013


Hi all,

I´m trying to estimate the posterior distribution of some variance components. The end application is to compare Qst and Fst values (quantitative genetics). A full Bayesian approach is straight forward but in many cases you have only 4-5 populations making it hard to estimate the among population variance, especially since variances often are small (yes, I know, 4 is too low....). You need a strong prior to get the mcmc to converge and produce a useful posterior.

To avoid the use of a strong prior I thought of using parametric bootstrap (a recommended approach to get CI on variance components and variance ratios). Does it make sense to use the posterior distribution after parametric bootstrap in a similar way as you use the posterior after MCMC sampling?
E.g. in further calculations where you want to include the uncertainty in the variance component. Of course, this is not the same posterior (MCMC and pm bootstrap) but there seems to be a close connection (Efron,http://arxiv.org/abs/1301.2936). Although this paper is over my head so clarifications are welcome.

I also looked at the option to use mcmcsamp() to get the posterior distribution of a variance component. Comparing these parametric bootstrap and mcmcsamp gave rather different answers. See (vertical line illustrates estimated variance):
https://dl.dropboxusercontent.com/u/20596053/Rlist/bootVSmcmcsamp.png
Parametric bootstrap gives a nice posterior around the estimated mean while mcmcsamp behaves like in the simulation (see below) with the peak closer to zero. Looking at the MCMC sampling, it seems like it gets stuck at zero a few times but it doesnt appear to be a major problem. To model is fairly simple:
lmer( trait ~ 1 + (1 | pop / pop.sire / pop.sire.dam ), data ) and the figure show the pop.sire component, which has >30 levels. lme4 version 0.999999-2.

I know that the mcmcsamp function has been criticized and not been considered reliable. Is this still the case?
  
A simple simulation of data with few groups (n=4) and rather low variance (2) give matching results between parametric bootstrap (n=10000) and mcmcsamp (n=10000). See code below to reproduce the figure found here:
https://dl.dropboxusercontent.com/u/20596053/Rlist/sim_data_bootVSmcmcsamp.png
Also with a high number of levels (n=30), the distribution peaks much closer to zero.
https://dl.dropboxusercontent.com/u/20596053/Rlist/sim_data_2_bootVSmcmcsamp.png

Any ideas or comments are welcome.

Cheers,

Gustaf Granath

ps. profiling might be a 3rd approach (?) that I never got to.


##Code to reproduce simulation study
##Some functions copied from Ben Bolker´s 
http://ms.mcmaster.ca/~bolker/R/misc/mcmcsampex.pdf.

     #Functions to get variance components from the mcmcsamp object
     get_pvars <- function(object) {
     v <- c(unlist(VarCorr(object)),units=sigma(object)^2)
     names(v) <- c(names(getME(object,"theta")),"units")
     v
}
get_mcvars <- function(object,mcmc,mcmcglmm=NULL) {
     ## get pvars, convert to 1-row data frame
     pvars <- as.data.frame(rbind(get_pvars(object)))
     ## get mcmc vars, drop fixed effects, rename
     nfix <- length(fixef(object))
     m0 <- as.data.frame(mcmc,type="varcov")[,-(1:nfix)]
     mmvars <- setNames(m0,names(pvars))
     L <- list(fitted=pvars,mcmc=mmvars)
     if (!is.null(mcmcglmm)) {
         1
         m1 <- as.data.frame(mcmcglmm$VCV)
         ## re-arrange column names to match lmer
         regexstr <-
                 "([[:alpha:]_\\(\\)]+)\\.([[:alpha:]_\\(\\)]+)"
         names(m1) <- gsub(regexstr,"\\2.\\1",names(m1))
         f1 <- rbind(colMeans(m1))
         ## combine mcmc and lmer results
         L$mcmc <- rbind(data.frame(method="mcmcsamp",
                         step=seq(nrow(L$mcmc)),
                         L$mcmc,check.names=FALSE),
                 data.frame(method="MCMCglmm",
                         step=seq(nrow(m1)),m1,check.names=FALSE))
         L$fitted <- rbind(data.frame(method="mcmcsamp",
                         L$fitted,check.names=FALSE),
                 data.frame(method="MCMCglmm",f1,check.names=FALSE))
     }
     L
}

#Simulate data
     set.seed(666)
     sims <- function(I, J, sigmab0, sigmaw0){
         Mu <- rnorm(I, mean=0, sd=sigmab0)
         y <- c(sapply(Mu, function(mu) rnorm(J, mu, sigmaw0)))
         data.frame(y=y, group=gl(I,J))
     }
     I <- 4 # number of groups
     J <- 50 # number of repeats per group
     sigmab0 <- sqrt(2) # between standard deviation
     sigmaw0 <- sqrt(3) # within standard deviation
     #Only 4 groups with variance = 2

     dat.sim <- sims(I, J, sigmab0, sigmaw0)
     mod <- lmer(y ~1 + (1|group), dat.sim)
     mod
     mm <- mcmcsamp(mod, n=10000)
     v1 <- get_mcvars(mod,mm)
     Vpop2=as.numeric(10000)
      #Warning: this takes a while
       for (i in 1:10000) {
         dat2<-simulate(mod)
         model2<-refit(mod, dat2)
         Vpop2[i]=VarCorr( model2)$group[1]
       }
     den <- data.frame(y = c(v1$mcmc[,1], Vpop2 ), met = rep( 
c("mcmcsamp","parametric boot"), 10000) )
     library(lattice)
     densityplot(~ y, data=den, groups = met, xlim = c(-1,10), 
main="Simulated data - between groups (n=4) variance = 2",
             plot.points = TRUE, xlab = "VAR(group)", auto.key = 
list(space = "right"))
     trellis.focus("panel", 1, 1, highlight = FALSE)
     panel.abline(v = VarCorr(mod)$group[1] )
     trellis.unfocus()

-- 
Gustaf Granath (PhD)
Post doc
McMaster University



More information about the R-sig-mixed-models mailing list