[R-sig-ME] MCMCglmm: starting values and variance explained by random effects

Thu May 1 12:20:53 CEST 2014

Hi Jarrod,

Thanks for your quick reply.

The reason why I used start=list(QUASI=FALSE) is because I read that you need overdispersed starting values to use the Gelman and Rubin's diagnostic. And in a different post on here you said (post from 23/10/2009 https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q4/002972.html):
"If you want over-dispersed starting values to make sure the chain is converging to the same distribution you can  
specify starting values in the start argument of MCMCglmm. start=list(QUASI=FALSE) is a quick way of getting sei-overdispersed starting values."

Due to the warning message, I assume the model did not use overdispersed starting values. Is it therefore still OK to use Gelman and Rubin's diagnostic? I plotted the traces and they show adequate mixing. If I can't use Gelman and Rubin's diagnostic (due to not using overdispersed starting values), is it sufficient to just check the trace plots for convergence?

Regarding the proportion of variance explained by the random effect: Can I measure the additional variance in the denominator coming from the Poisson distribution itself? Or is it something that I should not be concerned about? For example, I can confidently say that the majority of the variation in the data is explained by the random effect "site".

Thanks,

Mieke

Mieke Zwart
PhD student
School of Biology
Ridley 2
Newcastle University
Newcastle upon Tyne
NE1 7RU
United Kingdom
________________________________________
From: Jarrod Hadfield [j.hadfield at ed.ac.uk]
Sent: 01 May 2014 06:25
To: Mieke Zwart
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] MCMCglmm: starting values and variance explained by random effects

Hi Mieke,

Your model sounds reasonable. The warning is because you had
start=list(QUASI=FALSE) in the call to MCMCglmm, so `good' starting
values weren't used and the starting latent variables were drawn from
a unit normal. Nevertheless it looks like they converged, although I
always plot the traces too diagnose bad mixing (for the types of model
MCMCglmm fits you nearly always get convergence unless there is
something wrong with the model).

The proportion of variance explained that you give is for the latent
scale. This is OK, but you should bear in mind that on the data scale
there is additional variance in the denominator coming from the
Poisson distribution itself.

The default priors for the fixed effects are as you say. For the
variance components the default is nu=0 (i.e. a flat improper prior).
It sounds from your verbal description as if there is strong support
for between-site heterogeneity in abundance.

Cheers,

Jarrod

uoting Mieke Zwart <m.c.zwart at newcastle.ac.uk> on Wed, 30 Apr 2014
18:06:38 +0000:

> Dear list members,
>
> First of all, I would like to say that I think it is great that this
> list exists. I have learned so much by reading posts regularly and
> searching for answers when I encounter a problem.
>
> I have searched extensively before making this post, but have not
> been able to find an answer to some specific issues I encountered:
>
> I need some help regarding results that I get from a model run with
> the package MCMCglmm. I thought I interpreted things correctly after
> reading a lot of posts on here and reading through the Course Notes
> of the package, however a recent paper of mine got rejected and one
> reviewer had quite a few problems with the model. Before I send the
> paper anywhere else I would like to make sure that I am interpreting
> and explaining things correctly.
>
> Some brief explanation about the study:
> The data contains counts of birds at 9 different locations before
> and after a development (several years before, and several years
> after (up to 15 years post-construction)). We are interested in
> whether the counts changed after development. Since the initial
> numbers at each site are variable and differ quite a lot between
> sites, I used a random effect for site.
> I used MCMCglmm due to overdispersion using frequentist methods.
>
> The poisson model looks like this:
> MCMCglmm(counts ~ bef_af, random=~Site, data=dataframe, pr=TRUE,
> pl=TRUE, family="poisson", nitt=65000, thin=50, burnin=15000,
> start=list(QUASI=FALSE))
> where 'counts' is the number of birds per survey, 'bef_af' is a
> factor with either 0 or 1 (where 0 is before and 1 is after), 'site'
> is a character vector with the 9 different site names.
>
> The model is run 3 times to give 3 different chains. The chains are
> then checked for convergence via plotting:
> plot(mcmc.list(chain1$Sol, chain2$Sol, chain3$Sol))
> In addition, I checked the Gelman and Rubin's convergence diagnostic:
> gelman.diag(mcmc.list(chain1$Sol, chain2$Sol, chain3$Sol))
>
> The model gives the following error for the starting values:
> Warning message:
> In MCMCglmm(counts ~ bef_af, random = ~Site, data = dataframe,  :
>   good starting values not obtained: using Norm(0,1)
>
> The plots show adequate mixing of the chains but I am wondering
> whether the chains started at different appropriate values due to
> the warning message. Should I be concerned about the warning
> message? Did it use starting values drawn from a normal distribution?
>
> The Gelman and Rubin's diagnostic gave the following:
> Potential scale reduction factors:
>
>               Point est. Upper C.I.
> (Intercept)            1       1.00
> bef_af1                1       1.00
> Site.Site 1b          1       1.01
> Site.Site 2           1       1.00
> Site.Site 3           1       1.00
> Site.Site 4           1       1.00
> Site.Site 5           1       1.00
> Site.Site 6           1       1.00
> Site.Site 7           1       1.00
> Site.Site 8           1       1.00
> Site.Site 9           1       1.00
>
> Multivariate psrf
>
> 1.01
>
> Furthermore, I checked how much variance was explained by the random effect:
>
> HPDinterval(chain1$VCV[, "Site"]/(chain1$VCV[, "Site"] +
> chain1$VCV[, "units"]))
>
>         lower     upper
> var1 0.388643 0.8729441
> attr(,"Probability")
> [1] 0.95
>
> I interpreted this as follows: the majority of the variation in the
> data was explained by the difference between the locations. Both the
> 'Site' random effect and the residuals 'Units' posterior
> distribution plots show that both are located well away from zero
> (Plotted via plot(chain1$VCV)). Is my interpretation correct? To me
> it makes sense as the numbers of birds between the locations varied
> a lot (from mean of 2 birds at one location to a mean of 20 birds at
> another location).
> When a prior is not given in MCMCglmm() what defaults does it use?
> From the documentation I can see prior=NULL, but I assume that some
> prior must be given for bayesian models. Are the defaults: B$mu=0
> and B$V=I*1e+10, where where I is an identity matrix of appropriate
> dimension? I therefore assume that the default priors in MCMCglmm
> are centered on zero and since the posterior distribution is well
> away from zero, that therefore the random effects explain some
> variation in the data (especially 'site' which explains 38.8-87.3%).
> Is this correct?
>
> Thanks!
>
> Mieke
>
>       [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.