[R-sig-ME] lmer: parametric bootstrap versus mcmcsamp

Thu Sep 5 06:31:14 CEST 2013

Thanks for the comments Ben. Additional simulations with 2-3 
hierarchical levels give stronger evidence that parametric bootstrap is 
doing a good job. Distributions are often identical to the posterior 
obtained by MCMCglmm. mcmcsamp() is, however, in general very instable 
in my opinion. It sometimes surprises and matches the other results. I 
havent figured out under which conditions this is happening though.

I understand that few levels are problematic, but for some 
applications/questions you may be able to get some sensible answers 
(e.g. bias may be small compared to other uncertainties, etc).

Sometimes the pm bootstrap runs fail (convergence problems). I just read 
that there is going to be a slot in the release of lme4 (I guess it is 
already implemented in the development version) where warnings messages 
are stored. This will make it easier to sort out pm bootstrap runs that 
encountered problems.

Gustaf

>> Hi all,
>> I?m trying to estimate the posterior distribution of some variance
>> components. The end application is to compare Qst and Fst values
>> (quantitative genetics). A full Bayesian approach is straight
>> forward but in many cases you have only 4-5 populations making it
>> hard to estimate the among population variance, especially since
>> variances often are small (yes, I know, 4 is too low....). You need
>> a strong prior to get the mcmc to converge and produce a useful
>> posterior.
>    My initial reaction is that if you need a strong prior to get
> MCMC to converge, anything you do is going to be a little bit
> dodgy -- you're in a situation where you don't really have quite
> enough data to do what you want, and you're likely to end up
> with problems like biased estimates of the variance, e.g.
> http://rpubs.com/bbolker/4187  -- or the plug-in assumption of the
> parametric bootstrap (i.e. that the estimates are approximately
> the same as the true values)
>   
>> To avoid the use of a strong prior I thought of using parametric
>> bootstrap (a recommended approach to get CI on variance components
>> and variance ratios). Does it make sense to use the posterior
>> distribution after parametric bootstrap in a similar way as you use
>> the posterior after MCMC sampling?  E.g. in further calculations
>> where you want to include the uncertainty in the variance
>> component. Of course, this is not the same posterior (MCMC and pm
>> bootstrap) but there seems to be a close connection
>> (Efron,http://arxiv.org/abs/1301.2936). Although this paper is over
>> my head so clarifications are welcome.
>    In principle, yes.
>   
>> I also looked at the option to use mcmcsamp() to get the posterior
>> distribution of a variance component.  Comparing these parametric
>> bootstrap and mcmcsamp gave rather different answers. See (vertical
>> line illustrates estimated variance):
>> https://dl.dropboxusercontent.com/u/20596053/Rlist/bootVSmcmcsamp.png
>> Parametric bootstrap gives a nice posterior around the estimated
>> mean while mcmcsamp behaves like in the simulation (see below) with
>> the peak closer to zero. Looking at the MCMC sampling, it seems like
>> it gets stuck at zero a few times but it doesnt appear to be a major
>> problem. To model is fairly simple: lmer( trait ~ 1 + (1 | pop /
>> pop.sire / pop.sire.dam ), data ) and the figure show the pop.sire
>> component, which has >30 levels. lme4 version 0.999999-2.
>    A small detail -- it looks worth setting 'from=0' to constrain
> the density estimate above 0.  (Histograms are good too but somewhat
> harder to overlay.)  A histogram_might_  indicate that the main
> difference is really in the size of the point mass at zero -- there
> can be artifacts in density plots due to the smoothing ...
>
>> I know that the mcmcsamp function has been criticized and not been
>>   considered reliable. Is this still the case?
> I don't know.  Unfortunately I don't know of a good catalogue of bad
> mcmcsamp() examples; my impression was that most of them were 'sticky'
> zero boundaries, which you say isn't really a problem in your case.
>
>> A simple simulation of data with few groups (n=4) and rather low
>> variance (2) give matching results between parametric bootstrap
>> (n=10000) and mcmcsamp (n=10000). See code below to reproduce the
>> figure found here:
>> https://dl.dropboxusercontent.com/u/20596053/Rlist/sim_data_bootVSmcmcsamp.png
>> Also with a high number of levels (n=30), the distribution peaks
>> much closer to zero.
>> https://dl.dropboxusercontent.com/u/20596053/
>     Rlist/sim_data_2_bootVSmcmcsamp.png
>
>    These are both useful; the latter is particularly interesting --
> looks almost multimodal ...
>   
>> ps. profiling might be a 3rd approach (?) that I never got to.
>     I am encouraged by your later report (i.e. PB accurate/better
> than MCMC samp ...)

-- 
Gustaf Granath (PhD)
Post doc
McMaster University