[R-sig-ME] rare binary outcome, MCMCglmm, and priors (related to separation)

Jarrod Hadfield j.hadfield at ed.ac.uk
Tue Aug 31 11:43:27 CEST 2010


Hi Dave,

With respect to the prior specification for the fixed effects, you may  
want to make the variance larger.  Perhaps something like:

prior$B$V = diag(7)*(3+pi^2/3)

The motivation behind this is to choose a prior for b, for which  
plogis(Xb+Zu+e) would be close to a uniform after marginalising the  
random effects u and e.  pi^2/3 is the variance of the logistic  
distribution (the cdf of which is the inverse logit function) and  3  
is the variance of Zu+e assuming Z is an identity matrix (1 for the  
residual variance + ~2 for the person variance). You can see it is  
pretty close for an intercept.

priorB<-rnorm(1000, 0, sqrt(3+pi^2/3))
priorMB<-1:1000
for(i in 1:1000){
priorMB[i]<-mean(plogis(priorB[i]+rnorm(1000,0,sqrt(3))))
}
hist(priorMB)

This example works for a model with a single intercept, and when  
fitting a categorical predictor I usually remove the intercept (-1) so  
that the distribution is approximately uniform for all levels of the  
predictor. For continuous covariates and interactions it will be a bit  
more involved and you should probably read  Gelman et. al. 2008 Annals  
of Applied Statistics 1360-1383.

Using a prior with a variance of one will shrink the estimates to less  
extreme values and may explain some of the differences between models.  
However, if anything this new prior is likely to make the  mixing  
worse rather than better. Two options that may speed up mixing are  
using slice=TRUE in the call to MCMCglmm. This will use slice sampling  
to update the latent variables rather then MH updates. You could also  
use parameter expanded priors for G, but from your output it does not  
look like the variance is hitting zero so it is unlikely to improve  
things.

Cheers,

Jarrod





On 30 Aug 2010, at 21:37, David Atkins wrote:

>
> Some colleagues have collected data from 184 females in dating  
> relationships.  Data were collected daily using PDAs; the outcome is  
> a binary indicator of whether any physical aggression occurred  
> (intimate partner violence, or IPV).
>
> They are interested in 3 covariates:
>
> -- alcohol use: yes/no
> -- anger: rated on 1-5 scale
> -- verbal aggression: sum of handful of items, with 0-15 scale
>
> Their hypothesis is that the interaction of all 3 covariates will  
> lead to the highest likelihood of IPV.  As you might expect, the  
> outcome is very rare with 51 instances of IPV out of 8,269 days of  
> data, and 158 women (out of 184) reported no instances of IPV.
>
> Question 1: Given that a GLMM will assume a normal distribution for  
> the person-specific baserate in IPV, is this data even appropriate  
> for GLMM or should they be looking elsewhere (perhaps GEE)?
>
> That said, for some (unknown) proportion of individuals, there  
> probably would be instances of IPV if the data collection period  
> were longer. Thus, perhaps there is some basis for assuming a  
> distribution across people, even if the observed data for some  
> individuals are all zeroes.
>
> To present some of the data (and I can check to see if it would be  
> okay to make the data available), I dichotomized both anger and  
> verbal aggression ("prov.cut" below):
>
>  ang.cut prov.cut alc.cut ipv.yes ipv.no
> 1       0        0       0       0     3918
> 2       0        0       1       0        1
> 3       1        0       0       5     2381
> 4       1        0       1       1      292
> 5       1        1       0      36     1471
> 6       1        1       1       9      257
>
> Thus, the instances of IPV are more likely when there is anger and  
> verbal aggression; alcohol is a little less clear.  (And, if the  
> association of anger and verbal aggression with IPV seems  
> tautological, there has been debate about different forms of IPV,  
> where some research has pointed to "cold" aggression.)
>
> Not surprisingly, analyses using either glmer() or MCMCglmm() show  
> signs of partial separation, with some whopping odds-ratios and 95%  
> CI spanning a couple orders of magnitude.
>
> I have read a bit about the problems of separation in logistic  
> regression and know that Gelman et al suggest Bayesian priors as one  
> "solution".  Moreover, I see in Jarrod Hadfield's course notes that  
> his multinomial example has a "structural" zero that he addresses  
> via priors on pp. 96-97, though I confess I don't quite follow  
> exactly what he has done (and why).
>
> If I just let MCMCglmm cook on a regression with all 2-way  
> interactions for a long while:
>
> prior = list(R = list(V = 1, fix = 1),
> 			B = list(mu = c(rep(0,7)), V = diag(7)),
> 			G = list(G1 = list(V = 1, nu = 0.002)))
> lr.mcmc <- MCMCglmm(ipv ~ (alc.cut + angc + log(provc + 0.03))^2,  
> data = ipv.df,
> 					family = "categorical", verbose = TRUE,
> 					prior = prior,
> 					nitt = 2000000, burnin = 1000000, thin = 1000,
> 					random =  ~ person)
>
> The answers are less extreme than what I get with glmer, perhaps  
> suggesting this is wandering toward the "correct" solution, though  
> there are also plenty of indicators that we aren't there yet:
>
> > summary(lr.mcmc)
>
> Iterations = 1999001
> Thinning interval  = 1000001
> Sample size  = 1000
>
> DIC: 379.972
>
> G-structure:  ~person
>
>       post.mean l-95% CI u-95% CI eff.samp
> person     2.287   0.6775    4.206    194.0
>
> R-structure:  ~units
>
>      post.mean l-95% CI u-95% CI eff.samp
> units         1        1        1        0
>
> Location effects: ipv ~ (alc.cut + angc + log(provc + 0.03))^2
>
>                          post.mean l-95% CI u-95% CI eff.samp  pMCMC
> (Intercept)                -2.58724 -3.49492 -1.65250   245.20  
> <0.001 ***
> alc.cut                     0.49512 -0.75268  1.99397  1000.00  0.464
> angc                        0.02664 -0.34227  0.41365   283.90  0.880
> log(provc + 0.03)           1.36626  0.98743  1.70863    28.69  
> <0.001 ***
> alc.cut:angc               -0.16519 -0.79299  0.41949   683.56  0.590
> alc.cut:log(provc + 0.03)  -0.02631 -0.53118  0.55065   157.34  0.898
> angc:log(provc + 0.03)     -0.26132 -0.40141 -0.10407    74.98   
> 0.004 **
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> I haven't run multiple chains yet, but the effective sample sizes  
> and trace plots already suggest we ain't there yet.  My specific  
> question is whether there would be an alternative prior  
> specification for the fixed-effects that would be more appropriate?
>
> I would appreciate any and all thoughts here, including if this just  
> doesn't seem like an appropriate data/question for GLMMs.
>
> sessionInfo below.
>
> cheers, Dave
>
> > sessionInfo()
> R version 2.11.1 (2010-05-31)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats4    splines   stats     graphics  grDevices utils      
> datasets
> [8] methods   base
>
> other attached packages:
> [1] MCMCglmm_2.06      corpcor_1.5.7      ape_2.5-3
> [4] coda_0.13-5        Matrix_0.999375-44 lattice_0.19-10
> [7] tensorA_0.35       Hmisc_3.8-2        modeltools_0.2-16
> [10] mvtnorm_0.9-92     survival_2.36-1
>
> loaded via a namespace (and not attached):
> [1] cluster_1.13.1   coin_1.0-16      colorspace_1.0-1 gee_4.13-15
> [5] grid_2.11.1      lme4_0.999375-35 nlme_3.1-96      party_0.9-9998
> [9] rpart_3.1-46     tools_2.11.1
>
> -- 
> Dave Atkins, PhD
> Research Associate Professor
> Department of Psychiatry and Behavioral Science
> University of Washington
> datkins at u.washington.edu
>
> Center for the Study of Health and Risk Behaviors (CSHRB)		
> 1100 NE 45th Street, Suite 300 	
> Seattle, WA  98105 	
> 206-616-3879 	
> http://depts.washington.edu/cshrb/
> (Mon-Wed)	
>
> Center for Healthcare Improvement, for Addictions, Mental Illness,
>  Medically Vulnerable Populations (CHAMMP)
> 325 9th Avenue, 2HH-15
> Box 359911
> Seattle, WA 98104
> http://www.chammp.org
> (Thurs)
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




More information about the R-sig-mixed-models mailing list