[R-sig-ME] MCMCglmm models and (quasi-)complete separation

Thu May 24 14:16:54 CEST 2018

Dear fellow R users,

I have recently started using the MCMCglmm R package to analyse some of my
problematic
data which severely suffers from (quasi)complete separation.

I have followed Ben Bolker's suggestions of zero-mean Normal priors on the
fixed effects to analyse such kinds of data.
(https://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html)

My model is:

k<-8 #number of the fixed effects
     #Intercept+single effects+interactions

prior.c <- list(B=list(V=diag(9,k), mu=rep(0,k)),
                R=list(V=1,fix=1),
                G=list(G1=list(V=1, nu=1,alpha.mu=0, alpha.V=1000),
                       G2=list(V=1,nu=1,alpha.mu=0, alpha.V=1000),
                       G3=list(V=1,nu=1,alpha.mu=0, alpha.V=1000)))

nsamp <- 10000
THIN <- 900
BURNIN <- 10000
NITT <- BURNIN + THIN*nsamp
model3 = MCMCglmm(survival~
                    Site*b*c,
                  random=~x+Field+Field_block,
                  data=dset,
                    slice=TRUE,
                    pl=T,
                    prior=prior.c,
                    family="categorical",verbose=FALSE,
                    nitt=NITT,burnin=BURNIN,thin=THIN)

Survival is a binary value of 0 or 1 and is observed only once per
experimental plant.
Therefore the observation-level variance R is fixed to 1. (As in the linked
example.)

Site, b, and c are two-level categorical variables. x is crossed with Field
and Field_block, but Field_block is nested within Field.

Models are run for each species separately.

My questions are:

a) Many worked examples which I based my own analysis on use the
Gelman-Rubin
criterion where you check the convergence of your model by running it a
number of times and then compare models.

However, I think the MCMCglmm vignette said to start the model running with
overdispersed priors which is definitely not an option for me with the kind
of data I have.

I have tried using the testing for the Gelman-Rubin criterion nonetheless,
but the Gelman diagnostic plots do not show a oscillating line that finally
converges on a value but
rather clines and straigt lines.

b) I am also not quite sure, if the value R is fixed at is appropiate for
all models I run. For some
models, I still get latent variable values bigger than 20, even at very
high numbers of iterations.

c) How do you decide to use family="categorical" (=logit link) or "ordinal"
(=probit link)?
Based on the DIC of the models?

d) For many of my models, the explained variance for the random effects
Field and Field_block are very high; sometimes reaching an upper estimate
of 99%.
I think the problem is that Field_block is not only nested in Field but
that Field is also
nested in the categorical fixed effect Site.
Is my model overparametrized with regard to Field, since I have nearly
complete survival in one of the two levels of Site?

Kind regards,
Jasmin

	[[alternative HTML version deleted]]