[R-sig-ME] MCMC model selection reference
Jarrod Hadfield
j.hadfield at ed.ac.uk
Sun Apr 1 20:30:03 CEST 2012
Hi,
My understanding of DIC (and information criterion generally) is
woeful, but here are my thoughts on DIC - which I hope others will
correct if they disagree.
Does DIC wok in principal - yes, could it work in practice -
sometimes, does it work in practice - rarely (for hierarchical models).
DIC needs to be "focused". Imagine you have single Gaussian
observations (y) on children within schools. We have fixed effects b,
random effects u, and variance parameters Vs (between school variance)
and Ve (within school variance). We also have the fixed-effect design
matrix X and random-effect design matrix Z. We could calculate the
deviance using two likelihoods:
a) dmvnorm(y, X%*%b+Z%*%u, I*Ve)
b) dmvnorm(y, X%*%b, Z%*%t(Z)*Vs+I*Ve)
In a) we are conditioning on the school effects in b) we marginalise
them. The focus in a) is of the form "can we predict new observations
in *these* schools" and in b) "can we predict new observations in
*new* schools".
As a parent you're probably interested in a) as a scientist you're
probably interested in b).
MCMCglmm (and I believe WinBUGS, depending on how the model is
parameterised) focuses at the highest level a). The reason for this is
that MCMCglmm Gibbs samples u and then Gibbs samples Vs conditional on
u with out the need to calculate b) which is expensive (If DIC=TRUE,
a) will be calculated and this is easy). Presumably WinBUGS could
calculate a) or b) depending on how it is set up, but I think b) is
more usual (?) because of performance issues.
With over-dispersed non-Gaussian data the case for DIC (as
implemented) is very bad, because the highest level is the latent
variable (linear predictor). Lets imagine our observations on
children were how many times they missed the bus and we treated them
as log-normal Poisson. DIC would be focused at "can we predict how
many times *these* children miss the bus".
Modelling over-dispersion using a two-parameter distribution (without
observational-level effects), perhaps a negative binomial in our
example, may get us back to "can we predict how many times children
from *these* schools miss the bus" but getting down to a) may be more
difficult because with non-Gaussian data the random effects cannot be
marginalised analytically.
For non-Gaussian data I never use DIC, and have seriously considered
removing it from MCMCglmm.
Cheers,
Jarrod
Quoting "Steven J. Pierce" <pierces1 at msu.edu> on Sun, 1 Apr 2012
09:47:03 -0400:
> Here are a couple references on DIC that I happen to have handy:
>
> Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A.
> (2002). Bayesian measures of model complexity and fit. Journal of the Royal
> Statistical Society: Series B (Statistical Methodology), 64(4), 583-639.
> doi: 10.1111/1467-9868.00353 http://www.jstor.org/stable/3088806
>
> Barnett, A. G., Koper, N., Dobson, A. J., Schmiegelow, F., & Manseau, M.
> (2010). Using information criteria to select the correct variance-covariance
> structure for longitudinal data in ecology. Methods in Ecology and
> Evolution, 1(1), 15-24. doi: 10.1111/j.2041-210X.2009.00009.x
> http://dx.doi.org/10.1111/j.2041-210X.2009.00009.x
>
>
> Steven J. Pierce, Ph.D.
> Associate Director
> Center for Statistical Training & Consulting (CSTAT)
> Michigan State University
> E-mail: pierces1 at msu.edu
> Web: http://www.cstat.msu.edu
>
> -----Original Message-----
> From: Ray Danner [mailto:danner.ray at gmail.com]
> Sent: Saturday, March 31, 2012 2:24 PM
> To: r-sig-mixed-models at r-project.org
> Subject: [R-sig-ME] MCMC model selection reference
>
> Dear list,
>
> I'm looking for guidance on model selection using DIC values. I'm
> particularly interested in comparing mixed models created with the
> package MCMCglmm. I currently use AIC for my models built with lme
> and (g)lmer and like the ability to calculate evidence ratios and
> model average predictions, which are very easy for readers to
> conceptualize. AICcmodavg is great for these things.
>
> Can anyone recommend a resource that describes the appropriate use of
> DIC for model selection (and its limitations)? I'm mainly an
> ecologist, so a less-technical treatment would be ideal.
>
> My main questions are:
> 1. Can DIC be used to select among mixed models?
> Kery and Schaub (2012 p. 42) raise concerns about counting the correct
> number of parameters and state that WinBUGS does not calculate them
> appropriately, though Millar (2009) provides a method that is
> appropriate for hierarchical models. On the other hand, Saveliev et
> al. (2009) use DIC to compare models with random effects built with
> the BRugs package. Hadfield's MCMCglmm Tutorial says that lower DIC
> is better, but doesn't give details about use.
>
> 2. Any rules of thumb on what constitutes sufficiently large deltaDIC
> values? Are evidence ratios acceptable?
>
> 3. Can DIC be used to calculate model average predictions?
>
> Thanks in advance and please forgive me if I missed your publication.
> Ray
>
>
> Refs
> Kery and Schaub. 2012. Bayesian Population Analysis Using WinBUGS: A
> Hierarchical Perspective.
> Millar. 2009. Comparison of hierarchical Bayesian models for
> overdispersed count data using DIC and Bayes' Factors. Biometrics
> 65:962-969.
> Saveliev et al. 2009. Ch. 23 in Zuur, Mixed Effects Models and
> Extensions in Ecology with R.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list