[R-sig-ME] lmer: parametric bootstrap versus mcmcsamp

Thu Sep 5 00:07:49 CEST 2013

Gustaf Granath <gustaf.granath at ...> writes:

> 
> Hi all,

> I´m trying to estimate the posterior distribution of some variance
> components. The end application is to compare Qst and Fst values
> (quantitative genetics). A full Bayesian approach is straight
> forward but in many cases you have only 4-5 populations making it
> hard to estimate the among population variance, especially since
> variances often are small (yes, I know, 4 is too low....). You need
> a strong prior to get the mcmc to converge and produce a useful
> posterior.

  My initial reaction is that if you need a strong prior to get
MCMC to converge, anything you do is going to be a little bit
dodgy -- you're in a situation where you don't really have quite
enough data to do what you want, and you're likely to end up
with problems like biased estimates of the variance, e.g.
http://rpubs.com/bbolker/4187 -- or the plug-in assumption of the
parametric bootstrap (i.e. that the estimates are approximately
the same as the true values)

> To avoid the use of a strong prior I thought of using parametric
> bootstrap (a recommended approach to get CI on variance components
> and variance ratios). Does it make sense to use the posterior
> distribution after parametric bootstrap in a similar way as you use
> the posterior after MCMC sampling?  E.g. in further calculations
> where you want to include the uncertainty in the variance
> component. Of course, this is not the same posterior (MCMC and pm
> bootstrap) but there seems to be a close connection
> (Efron,http://arxiv.org/abs/1301.2936). Although this paper is over
> my head so clarifications are welcome.

  In principle, yes.

> I also looked at the option to use mcmcsamp() to get the posterior
> distribution of a variance component.  Comparing these parametric
> bootstrap and mcmcsamp gave rather different answers. See (vertical
> line illustrates estimated variance):
> https://dl.dropboxusercontent.com/u/20596053/Rlist/bootVSmcmcsamp.png
> Parametric bootstrap gives a nice posterior around the estimated
> mean while mcmcsamp behaves like in the simulation (see below) with
> the peak closer to zero. Looking at the MCMC sampling, it seems like
> it gets stuck at zero a few times but it doesnt appear to be a major
> problem. To model is fairly simple: lmer( trait ~ 1 + (1 | pop /
> pop.sire / pop.sire.dam ), data ) and the figure show the pop.sire
> component, which has >30 levels. lme4 version 0.999999-2.

  A small detail -- it looks worth setting 'from=0' to constrain
the density estimate above 0.  (Histograms are good too but somewhat
harder to overlay.)  A histogram _might_ indicate that the main
difference is really in the size of the point mass at zero -- there
can be artifacts in density plots due to the smoothing ...

> I know that the mcmcsamp function has been criticized and not been
>  considered reliable. Is this still the case?

I don't know.  Unfortunately I don't know of a good catalogue of bad
mcmcsamp() examples; my impression was that most of them were 'sticky'
zero boundaries, which you say isn't really a problem in your case.

> A simple simulation of data with few groups (n=4) and rather low
> variance (2) give matching results between parametric bootstrap
> (n=10000) and mcmcsamp (n=10000). See code below to reproduce the
> figure found here:
> https://dl.dropboxusercontent.com/u/20596053/Rlist/sim_data_bootVSmcmcsamp.png
> Also with a high number of levels (n=30), the distribution peaks
> much closer to zero.
> https://dl.dropboxusercontent.com/u/20596053/
   Rlist/sim_data_2_bootVSmcmcsamp.png

  These are both useful; the latter is particularly interesting -- 
looks almost multimodal ...

> ps. profiling might be a 3rd approach (?) that I never got to.

   I am encouraged by your later report (i.e. PB accurate/better
than MCMC samp ...)