[R-sig-ME] Convergence in binomial MCMCglmm
j.hadfield at ed.ac.uk
Wed Jun 8 06:59:17 CEST 2011
Slice sampling (slice=TRUE) can be more efficient than
Metropolis-Hastings updates (the default), but often the mixing
properties of models with binary data are not good and only a marginal
increase in efficiency can be expected by slice sampling.
Parameter expansion can dramatically increase efficiency if the
posterior density for the variance components has support at small
values, and also induces a prior which has some nice properties. In
general however I would be wary about fiddling with priors to achieve
better mixing. These are separate issues and poor mixing is really a
shortcoming of the algorithm in the context of the model/data rather
than a problem with the probability model per se (although poor mixing
is often associated with ill-posed problems).
Very rare or very common outcomes can cause numerical problems in
MCMCglmm because of under/overflow in the logit/probit functions, and
these should be looked out for. The classic symptom is that the trace of
the posterior looks reasonable but then moves into extreme values for
some period - this generates high autocorrelation. In extreme cases
(e.g. extreme category problem) the posterior is never reasonable
without a strongish prior and very large values for the location effects
and variance components are sampled. As a rule of thumb, if the absolute
values of the latent variables are <20 for logit models and <7 for
probit models then there should be no problem.
I'm not sure that censoring can be adequately handled when treating
survival as a series of 0 binary outcomes ending in a 1, but I would be
interested if anyone knows how it could be dealt with?
On Tue, 2011-06-07 at 14:56 +0100, Adam Hayward wrote:
> Dear list members,
> I attempting to investigate the effects of food availability in a given year
> on individual survival using the MCMCglmm package, and I'm running into
> problems with model mixing.
> Food data is available for a subset of the overall study period, and so some
> individuals are born before the food data begins, and some do not die until
> after the food data ends- therefore, not every individual is sampled across
> their entire lifespan, and not every individual dies. I am using family =
> categorical, and my response variable is binomial (survived or died), with
> every individual having a record for every year of life that falls within
> the period of food data. Therefore, the vast majority of responses are 1
> (28,636 out of 29,577). I am using random effects of individual identity,
> maternal identity, and the year of sampling, all of which are substantial
> (posterior distribution >> 0). I imagine that a sensible prior would be to
> fix the residual variance at 1, and to apply an inverse-gamma prior to the
> other random effects. However, this, and a number of other priors that Ive
> tried have produced high autocorrelation (in the order of >0.7) for the
> variance components and low effective sample size (albeit using only 13,000
> iterations). I wonder whether the problem is the sheer number of 1s in the
> dataframe, and whether the solution is in specifying a better prior, or
> simply in running the analysis for much longer. Any advice from anyone with
> experience of analysis of this type of data or these types of problems would
> be gratefully received.
> Best regards,
> R-sig-mixed-models at r-project.org mailing list
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models