[R-sig-ME] effective sample size in MCMCglmm

Tue Oct 10 20:17:20 CEST 2017

Hi Dani,

You might want to have a good read through the extensive Course Notes 
that Jarrod Hadfield has written to accompany the MCMCglmm package. 
Particularly (but not exclusively), Sections 1.3.1 and all of 1.4 
pertain to these issues.

Your specifications of `nitt`, `thin`, and `burnin` are such that you 
have retained almost 20,000 samples so it is not surprising your 
computer is having issues. However, the effective sample sizes being so 
low means that the almost 20,000 samples you have retained are not 
independent.

What you really need to change is the thinning interval. Given how bad 
your effective sample sizes are compared to the total number of samples 
retained, I would start with `thin=100` and see if that reduces the 
autocorrelation between successive samples. You can check the 
autocorrelation with `autocorr.diag(m3.new$Sol[, 1:k])` and 
`autocorr.diag(m3.new$VCV)`, for the location effects and variance 
components, respectively.

You will need to figure out the best `burnin`, but start with 
`burnin=3000` and increase if the traceplots show a pattern at the 
beginning of the trace.

Remember to adjust `nitt` to run the MCMC long enough to get the desired 
number of samples, but not excessively longer. So nitt = burnin + 
thin*(number of samples to keep). All of this will result in a suggested 
model specification like the following, but this will likely need to be 
changed once you diagnose the performance with your actual data:

nsamp <- 1000
THIN <- 100
BURNIN <- 3000
NITT <- BURNIN + THIN*nsamp
m3new <- MCMCglmm(y ~ f_newage_c+x2n+x8n+x9n+x5n+l_lfvcspo+x3n+x4n+x6n+x7n+offset,
                   random =~ studyid+class+idv(l_lfvcspn),
                   data   = wo1,
                   family = "poisson", prior=prior2,
                   verbose=FALSE,
                   thin   = THIN,     #<-- CHANGED
                   burnin = BURNIN,   #<-- CHANGED
                   nitt   = NITT,     #<-- CHANGED
                   saveX=TRUE, saveZ=TRUE, saveXL=TRUE, pr=TRUE, pl=TRUE)
autocorr.diag(m3new$Sol[, 1:k])
autocorr.diag(m3new$VCV)

With regards to your other message concerning the trace plots, this is 
likely because you have so many samples (almost 20,000). Once you have 
changed the `nitt` to reflect the `thin` and `burnin` required to give 
you the least autocorrelation and effective samples close to the number 
of samples you want, then you should be able to save the model and run 
the MCMC diagnostics much more easily.

Sincerely,
Matthew

On 10/10/2017 12:44 PM, dani wrote:
> Hello everyone,
>
>
> My question is:
>
> do the effective samples I obtain in my MCMCglmm output (attached below) make sense?
>
>
> I understand that the rule of thumb is to get effective samples of at least 100-1000. How should I tweak the thin, burnin, and the nitt specifications? My computer reaches its memory limit fast and I have barely been able to run the model below.
>
>
> I have the following model:
>
> k<-12 # number of fixed effects
>
> prior2<-list(B=list(V=diag(k)*1e4, mu=rep(0,k)),
>               R=list(V=1, nu=0),
>               G=list(G1=list(V=1, nu=0),
>                      G2=list(V=1, nu=0),
>                      G3=list(V=1, nu=0)))
>
> prior2$B$mu[k]<-1
> prior2$B$V[k,k]<-1e-4
>
> m3new <- MCMCglmm(y ~ f_newage_c+x2n+x8n+x9n+x5n+l_lfvcspo+x3n+x4n+x6n+x7n+offset,
>                    random =~ studyid+class+idv(l_lfvcspn),
>                    data   = wo1,
>                    family = "poisson", prior=prior2,
>                    verbose=FALSE,
>                    thin   = 10,
>                    burnin = 2000,
>                    nitt   = 200000,
>                    saveX=TRUE, saveZ=TRUE, saveXL=TRUE, pr=TRUE, pl=TRUE)
>
>
> Iterations = 2001:199991
> Thinning interval  = 10
> Sample size  = 19800
> DIC: 2930.006
>
> G-structure:
> ~studyid
>             post.mean  l-95% CI u-95% CI eff.samp
> studyid    0.1053 1.814e-11   0.5757    81.12
> ~class
>            post.mean l-95% CI u-95% CI eff.samp
> class    0.7008  0.07577    1.207    382.1
>
> ~idv(l_lfvcspn)
>                   post.mean l-95% CI u-95% CI eff.samp
> l_lfvcspn.     705.7    37.33     2044    11852
>
> R-structure:
> ~units
>             post.mean l-95% CI u-95% CI eff.samp
> units     2.516    1.809     3.23    336.6
>
> Location effects: y ~ f_newage_c + x2n + x8n + x9n + x5n + l_lfvcspo + x3n + x4n + x6n + x7n + offset
>                         post.mean   l-95% CI   u-95% CI eff.samp  pMCMC
> (Intercept) -7.0395427 -7.5590206 -6.5187847    865.6 <5e-05 ***
> f_newage_c   0.0099703 -0.0222981  0.0448880   3324.7 0.5615
> x2nM        -0.1068377 -0.3782251  0.1678528   3760.2 0.4462
> x8n1         0.4103047  0.0920875  0.7179638   3884.3 0.0099 **
> x9n1        -0.2784715 -0.5975232  0.0495615   3337.9 0.0901 .
> x5n         -0.0009378 -0.0064175  0.0044266   3528.4 0.7283
> l_lfvcspo    0.4018810 -0.8271349  1.4468375  14536.6 0.4080
> x3n          0.0789652 -0.0018683  0.1523108   3726.0 0.0438 *
> x4n          0.0602655 -0.0643903  0.1859443   2711.9 0.3356
> x6n         -0.0137132 -0.0728385  0.0449804   3489.7 0.6453
>
> Thank you all so much,
> Dani
>
>
> <http://aka.ms/weboutlook>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 

****************************************************
Matthew E. Wolak, Ph.D.
Assistant Professor
Department of Biological Sciences
Auburn University
306 Funchess Hall
Auburn, AL 36849, USA
Email: matthew.wolak at auburn.edu
Tel: 334-844-9242