[R-sig-ME] MCMC model selection reference

Tue Apr 17 23:05:50 CEST 2012

Hi,

In regard to your last comment below about not using DIC for non-Gaussian data, would it be reasonable to instead conduct AIC based model selection (say with glmer) and then run the preferred model in MCMCglmm for inference?

Many thanks,

Steve
On Apr 1, 2012, at 2:30 PM, Jarrod Hadfield wrote:

> Hi,
> 
> My understanding of DIC (and information criterion generally) is woeful, but here are my thoughts on DIC - which I hope others will correct if they disagree.
> 
> Does DIC wok in principal  - yes, could it work in practice  - sometimes, does it work in practice - rarely (for hierarchical models).
> 
> DIC needs to be "focused". Imagine you have single Gaussian observations (y) on children within schools.  We have fixed effects b, random effects u, and variance parameters Vs (between school variance) and Ve (within school variance). We also have the fixed-effect design matrix X and random-effect design matrix Z.  We could calculate the deviance using two likelihoods:
> 
> a) dmvnorm(y, X%*%b+Z%*%u, I*Ve)
> b) dmvnorm(y, X%*%b, Z%*%t(Z)*Vs+I*Ve)
> 
> In a) we are conditioning on the school effects in b) we marginalise them. The focus in a) is of the form "can we predict new observations in *these* schools" and in b) "can we predict new observations in *new* schools".
> 
> As a parent you're probably interested in a) as a scientist you're probably interested in b).
> 
> MCMCglmm (and I believe WinBUGS, depending on how the model is parameterised) focuses at the highest level a). The reason for this is that MCMCglmm Gibbs samples u and then Gibbs samples Vs conditional on u with out the need to calculate b) which is expensive (If DIC=TRUE, a) will be calculated and this is easy). Presumably WinBUGS could calculate a) or b) depending on how it is set up, but I think b) is more usual (?) because of performance issues.
> 
> With over-dispersed non-Gaussian data the case for DIC (as implemented) is very bad, because the highest level is the latent variable (linear predictor).  Lets imagine our observations on children were how many times they missed the bus and we treated them as log-normal Poisson. DIC would be focused at "can we predict how many times *these* children miss the bus".
> 
> Modelling over-dispersion using a two-parameter distribution (without observational-level effects), perhaps a negative binomial in our example, may get us back to "can we predict how many times children from *these* schools miss the bus" but getting down to a) may be more difficult because with non-Gaussian data the random effects cannot be marginalised analytically.
> 
> For non-Gaussian data I never use DIC, and have seriously considered removing it from MCMCglmm.
> 
> Cheers,
> 
> Jarrod
> 
> 
> 
> Quoting "Steven J. Pierce" <pierces1 at msu.edu> on Sun, 1 Apr 2012 09:47:03 -0400:
> 
>> Here are a couple references on DIC that I happen to have handy:
>> 
>> Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A.
>> (2002). Bayesian measures of model complexity and fit. Journal of the Royal
>> Statistical Society: Series B (Statistical Methodology), 64(4), 583-639.
>> doi: 10.1111/1467-9868.00353  http://www.jstor.org/stable/3088806
>> 
>> Barnett, A. G., Koper, N., Dobson, A. J., Schmiegelow, F., & Manseau, M.
>> (2010). Using information criteria to select the correct variance-covariance
>> structure for longitudinal data in ecology. Methods in Ecology and
>> Evolution, 1(1), 15-24. doi: 10.1111/j.2041-210X.2009.00009.x
>> http://dx.doi.org/10.1111/j.2041-210X.2009.00009.x
>> 
>> 
>> Steven J. Pierce, Ph.D.
>> Associate Director
>> Center for Statistical Training & Consulting (CSTAT)
>> Michigan State University
>> E-mail: pierces1 at msu.edu
>> Web: http://www.cstat.msu.edu
>> 
>> -----Original Message-----
>> From: Ray Danner [mailto:danner.ray at gmail.com]
>> Sent: Saturday, March 31, 2012 2:24 PM
>> To: r-sig-mixed-models at r-project.org
>> Subject: [R-sig-ME] MCMC model selection reference
>> 
>> Dear list,
>> 
>> I'm looking for guidance on model selection using DIC values.  I'm
>> particularly interested in comparing mixed models created with the
>> package MCMCglmm.  I currently use AIC for my models built with lme
>> and (g)lmer and like the ability to calculate evidence ratios and
>> model average predictions, which are very easy for readers to
>> conceptualize.  AICcmodavg is great for these things.
>> 
>> Can anyone recommend a resource that describes the appropriate use of
>> DIC for model selection (and its limitations)?  I'm mainly an
>> ecologist, so a less-technical treatment would be ideal.
>> 
>> My main questions are:
>> 1. Can DIC be used to select among mixed models?
>> Kery and Schaub (2012 p. 42) raise concerns about counting the correct
>> number of parameters and state that WinBUGS does not calculate them
>> appropriately, though Millar (2009) provides a method that is
>> appropriate for hierarchical models.  On the other hand, Saveliev et
>> al. (2009) use DIC to compare models with random effects built with
>> the BRugs package.  Hadfield's MCMCglmm Tutorial says that lower DIC
>> is better, but doesn't give details about use.
>> 
>> 2. Any rules of thumb on what constitutes sufficiently large deltaDIC
>> values?  Are evidence ratios acceptable?
>> 
>> 3. Can DIC be used to calculate model average predictions?
>> 
>> Thanks in advance and please forgive me if I missed your publication.
>> Ray
>> 
>> 
>> Refs
>> Kery and Schaub. 2012. Bayesian Population Analysis Using WinBUGS: A
>> Hierarchical Perspective.
>> Millar. 2009. Comparison of hierarchical Bayesian models for
>> overdispersed count data using DIC and Bayes' Factors. Biometrics
>> 65:962-969.
>> Saveliev et al. 2009. Ch. 23 in Zuur, Mixed Effects Models and
>> Extensions in Ecology with R.
>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
>> 
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models