[R-sig-ME] effective sample size in MCMCglmm

dani orchidn at live.com
Tue Oct 10 21:41:35 CEST 2017


Hello again,


Thank you so much, I will change the code and specify pl=FALSE!

I ran the model based on Matthew's suggestion and I got the output below. What effective sample should I aim for in regards to the  G structure (studyid) - should I try again tweaking the specifications based on higher burnins until I reach an effective sample of 100 for studyid?


On a separate note, I managed to plot the model, but it looks illegible, the graphs are so small given the large number of groups. Should I plot just a random subsample then?


Thank you all so much,

D

Iterations = 3001:102901
Thinning interval  = 100
Sample size  = 1000

DIC: 2928.214

 G-structure:  ~studyid

           post.mean  l-95% CI u-95% CI eff.samp
studyid   0.06139 5.424e-16   0.4621    21.16

               ~class

      post.mean l-95% CI u-95% CI eff.samp
class    0.7278    0.132    1.232    154.9

               ~idv(l_lfvcspn)

           post.mean l-95% CI u-95% CI eff.samp
l_lfvcspn.     752.6    50.16     2352     1000

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units     2.533    1.907    3.204    186.5

 Location effects: y ~ f_newage_c + x2n + x8n + x9n + x5n + l_lfvcspo + x3n + x4n + x6n + x7n + offset

            post.mean  l-95% CI  u-95% CI eff.samp  pMCMC
(Intercept) -7.033381 -7.535696 -6.524339    487.7 <0.001 ***
f_newage_c   0.009066 -0.025116  0.041159    909.3  0.596
x2nM        -0.107655 -0.403837  0.165664   1000.0  0.440
x8n1         0.413878  0.119578  0.745171    891.1  0.002 **
x9n1        -0.287382 -0.600003  0.016531    892.7  0.070 .
x5n         -0.001037 -0.006182  0.004417    864.8  0.670
l_lfvcspo    0.397952 -0.941309  1.455719   1000.0  0.426
x3n          0.079109 -0.001274  0.150664    914.1  0.042 *
x4n          0.058303 -0.067089  0.174404   1000.0  0.344
x6n         -0.013916 -0.067176  0.049719   1000.0  0.610
x7n         -0.014880 -0.081243  0.057582    710.6  0.654
offset       0.999452  0.977609  1.018083   1000.0 <0.001 ***


<http://aka.ms/weboutlook>


________________________________
From: Jarrod Hadfield <j.hadfield at ed.ac.uk>
Sent: Tuesday, October 10, 2017 12:29 PM
To: dani; Matthew; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] effective sample size in MCMCglmm

Hi,

You probably want to have pl=FALSE too, unless you have a special reason
to save the latent variables? Currently this saves 20,000*7,000 =
140,000,000 numbers, which will take about 1GB of memory.

Cheers,

Jarrod



On 10/10/2017 19:21, dani wrote:
> Hello Matthew,
>
>
> Thank you so much for such a thorough answer! This is so helpful! I truly appreciate it!
>
> Best regards,
>
> D
>
> ________________________________
> From: Matthew <mew0099 at auburn.edu>
> Sent: Tuesday, October 10, 2017 11:17 AM
> To: dani; r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] effective sample size in MCMCglmm
>
> Hi Dani,
>
> You might want to have a good read through the extensive Course Notes
> that Jarrod Hadfield has written to accompany the MCMCglmm package.
> Particularly (but not exclusively), Sections 1.3.1 and all of 1.4
> pertain to these issues.
>
> Your specifications of `nitt`, `thin`, and `burnin` are such that you
> have retained almost 20,000 samples so it is not surprising your
> computer is having issues. However, the effective sample sizes being so
> low means that the almost 20,000 samples you have retained are not
> independent.
>
> What you really need to change is the thinning interval. Given how bad
> your effective sample sizes are compared to the total number of samples
> retained, I would start with `thin=100` and see if that reduces the
> autocorrelation between successive samples. You can check the
> autocorrelation with `autocorr.diag(m3.new$Sol[, 1:k])` and
> `autocorr.diag(m3.new$VCV)`, for the location effects and variance
> components, respectively.
>
> You will need to figure out the best `burnin`, but start with
> `burnin=3000` and increase if the traceplots show a pattern at the
> beginning of the trace.
>
> Remember to adjust `nitt` to run the MCMC long enough to get the desired
> number of samples, but not excessively longer. So nitt = burnin +
> thin*(number of samples to keep). All of this will result in a suggested
> model specification like the following, but this will likely need to be
> changed once you diagnose the performance with your actual data:
>
> nsamp <- 1000
> THIN <- 100
> BURNIN <- 3000
> NITT <- BURNIN + THIN*nsamp
> m3new <- MCMCglmm(y ~ f_newage_c+x2n+x8n+x9n+x5n+l_lfvcspo+x3n+x4n+x6n+x7n+offset,
>                     random =~ studyid+class+idv(l_lfvcspn),
>                     data   = wo1,
>                     family = "poisson", prior=prior2,
>                     verbose=FALSE,
>                     thin   = THIN,     #<-- CHANGED
>                     burnin = BURNIN,   #<-- CHANGED
>                     nitt   = NITT,     #<-- CHANGED
>                     saveX=TRUE, saveZ=TRUE, saveXL=TRUE, pr=TRUE, pl=TRUE)
> autocorr.diag(m3new$Sol[, 1:k])
> autocorr.diag(m3new$VCV)
>
> With regards to your other message concerning the trace plots, this is
> likely because you have so many samples (almost 20,000). Once you have
> changed the `nitt` to reflect the `thin` and `burnin` required to give
> you the least autocorrelation and effective samples close to the number
> of samples you want, then you should be able to save the model and run
> the MCMC diagnostics much more easily.
>
> Sincerely,
> Matthew
>
>
>
> On 10/10/2017 12:44 PM, dani wrote:
>> Hello everyone,
>>
>>
>> My question is:
>>
>> do the effective samples I obtain in my MCMCglmm output (attached below) make sense?
>>
>>
>> I understand that the rule of thumb is to get effective samples of at least 100-1000. How should I tweak the thin, burnin, and the nitt specifications? My computer reaches its memory limit fast and I have barely been able to run the model below.
>>
>>
>> I have the following model:
>>
>> k<-12 # number of fixed effects
>>
>> prior2<-list(B=list(V=diag(k)*1e4, mu=rep(0,k)),
>>                R=list(V=1, nu=0),
>>                G=list(G1=list(V=1, nu=0),
>>                       G2=list(V=1, nu=0),
>>                       G3=list(V=1, nu=0)))
>>
>> prior2$B$mu[k]<-1
>> prior2$B$V[k,k]<-1e-4
>>
>> m3new <- MCMCglmm(y ~ f_newage_c+x2n+x8n+x9n+x5n+l_lfvcspo+x3n+x4n+x6n+x7n+offset,
>>                     random =~ studyid+class+idv(l_lfvcspn),
>>                     data   = wo1,
>>                     family = "poisson", prior=prior2,
>>                     verbose=FALSE,
>>                     thin   = 10,
>>                     burnin = 2000,
>>                     nitt   = 200000,
>>                     saveX=TRUE, saveZ=TRUE, saveXL=TRUE, pr=TRUE, pl=TRUE)
>>
>>
>> Iterations = 2001:199991
>> Thinning interval  = 10
>> Sample size  = 19800
>> DIC: 2930.006
>>
>> G-structure:
>> ~studyid
>>              post.mean  l-95% CI u-95% CI eff.samp
>> studyid    0.1053 1.814e-11   0.5757    81.12
>> ~class
>>             post.mean l-95% CI u-95% CI eff.samp
>> class    0.7008  0.07577    1.207    382.1
>>
>> ~idv(l_lfvcspn)
>>                    post.mean l-95% CI u-95% CI eff.samp
>> l_lfvcspn.     705.7    37.33     2044    11852
>>
>> R-structure:
>> ~units
>>              post.mean l-95% CI u-95% CI eff.samp
>> units     2.516    1.809     3.23    336.6
>>
>> Location effects: y ~ f_newage_c + x2n + x8n + x9n + x5n + l_lfvcspo + x3n + x4n + x6n + x7n + offset
>>                          post.mean   l-95% CI   u-95% CI eff.samp  pMCMC
>> (Intercept) -7.0395427 -7.5590206 -6.5187847    865.6 <5e-05 ***
>> f_newage_c   0.0099703 -0.0222981  0.0448880   3324.7 0.5615
>> x2nM        -0.1068377 -0.3782251  0.1678528   3760.2 0.4462
>> x8n1         0.4103047  0.0920875  0.7179638   3884.3 0.0099 **
>> x9n1        -0.2784715 -0.5975232  0.0495615   3337.9 0.0901 .
>> x5n         -0.0009378 -0.0064175  0.0044266   3528.4 0.7283
>> l_lfvcspo    0.4018810 -0.8271349  1.4468375  14536.6 0.4080
>> x3n          0.0789652 -0.0018683  0.1523108   3726.0 0.0438 *
>> x4n          0.0602655 -0.0643903  0.1859443   2711.9 0.3356
>> x6n         -0.0137132 -0.0728385  0.0449804   3489.7 0.6453
>>
>> Thank you all so much,
>> Dani
>>
>>
>> <http://aka.ms/weboutlook>
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> R-sig-mixed-models Info Page - stat.ethz.ch<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> stat.ethz.ch
> Your email address: Your name (optional): You may enter a privacy password below. This provides only mild security, but should prevent others from messing ...
>
>
>
>
> --
>
>
> ****************************************************
> Matthew E. Wolak, Ph.D.
> Assistant Professor
> Department of Biological Sciences
> Auburn University
> 306 Funchess Hall
> Auburn, AL 36849, USA
> Email: matthew.wolak at auburn.edu
> Tel: 334-844-9242
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list