[R-sig-ME] MCMCglmm models and (quasi-)complete separation

Fri May 25 16:48:59 CEST 2018

Dear Jasmin,

Have you had a look at the MCMCglmm Course Notes (apologies if you have)? Specifically in Ch. 2 (specifically pp.52-57, of my Nov. 14, 2016 version, if you want to jump right into some of the relevant chunks). That at least gives an example and works through some of the issues you are dealing with. In particular, when Jarrod explains setting a better prior on the fixed effects, he drops the models global intercept, which might be pretty helpful in your case.

Just a couple (~5) additional thoughts inserted to your original email below. Hope this helps!

Sincerely,
Matthew
________________________________________
From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> on behalf of Jasmin Herden <jasmin.herden using gmail.com>
Sent: Friday, May 25, 2018 7:05 AM
To: r-sig-mixed-models using r-project.org; David.Duffy using qimrberghofer.edu.au
Subject: Re: [R-sig-ME] MCMCglmm models and (quasi-)complete separation

Dear David,

a)

*Dear Jasmin. I, at least, would need to see some kind of data to
understand your comments re separation etc.*
Here are the first few lines of the data set.

> head(dsetMailingList)       Site Field Species PlantID native.neophyte Treatment Seed_family Origin survival_to_harvest   Sp_Sf Field_Subplot
355 Potsdam    DB     Dat Dat10CK        neophyte       CON
E10      K                   1 Dat.E10        DB _ C
356 Potsdam    MQ     Dat Dat10CK        neophyte       CON
E10      K                   0 Dat.E10        MQ _ B
357 Potsdam    GR     Dat Dat10CK        neophyte       CON
E10      K                   1 Dat.E10        GR _ B
358 Potsdam    GR     Dat Dat10CP        neophyte       CON
10      P                   0  Dat.10        GR _ B
359 Potsdam    DB     Dat Dat10CP        neophyte       CON
10      P                   1  Dat.10        DB _ A
360 Potsdam    MQ     Dat Dat10CP        neophyte       CON
10      P                   1  Dat.10        MQ _ B

>

b) *Does a simpler model run eg dropping field_block and simplifying the
fixed effects? And does such a model run in lmer or other programs?*

I had originally started with a binomial glmer model, but in most species
this lead to ridiculously large standard errors. These large SE values are
indicative of quasi-complete separation (i.e. nearly complete 0 or 1 for
one factor level). (See also the following link:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004172.html).

I have tried using blglmer as suggested by Ben Bolker in the example
referenced in my first post to the mailing list. In principle, all models
with complete separation looked good with bglmer.
However, my intended approach in glmer was to take the full model and get
likelihood ratio tests and corresponding p-values for the fixed effects by
step-wise model
reduction. Theoretically, this is also possible with bglmer, but Ben Bolker
advised against using bglmer for more than just an overall model check in
several posts. Furthermore, I am not quite sure if or how I need to adjust
the priors in bglmer during step-wise model reduction or if step-wise model
reduction is appropiate for bglmer at all. Ben Bolker recommended using the
MCMCglmm package instead of bglmer for a proper analysis.

Due to my approach of testing all fixed effects (Site, Origin, and
Treatment, and interactions) in my local adaptation study, reducing the
fixed effects is not really an option.

Kind regards,
Jasmin Herden

On Fri, May 25, 2018 at 8:58 AM, David Duffy <David.Duffy using qimrberghofer.
edu.au> wrote:

>  Does a simpler model run eg dropping field_block and simplifying the
> fixed effects? And does such a model run in lmer or other programs?
> Diagnosing a nonidentified model (ie some parameters in your model may not
> be estimable given your pattern of observations) in MCMC is hard.
> ________________________________________
> From: R-sig-mixed-models [r-sig-mixed-models-bounces using r-project.org] on
> behalf of Jasmin Herden [jasmin.herden using gmail.com]
> Sent: Thursday, 24 May 2018 10:16 PM
> To: r-sig-mixed-models using r-project.org
> Subject: Re: [R-sig-ME] MCMCglmm models and (quasi-)complete separation
>
> Dear fellow R users,
>
> I have recently started using the MCMCglmm R package to analyse some of my
> problematic
> data which severely suffers from (quasi)complete separation.
>
> I have followed Ben Bolker's suggestions of zero-mean Normal priors on the
> fixed effects to analyse such kinds of data.
> (https://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html)
>
> My model is:
>
> k<-8 #number of the fixed effects
>      #Intercept+single effects+interactions
>

####  <<INSERT>>  ####
# So if you removed the global intercept, k<-7

> prior.c <- list(B=list(V=diag(9,k), mu=rep(0,k)),
>                 R=list(V=1,fix=1),
>                 G=list(G1=list(V=1, nu=1,alpha.mu=0, alpha.V=1000),
>                        G2=list(V=1,nu=1,alpha.mu=0, alpha.V=1000),
>                        G3=list(V=1,nu=1,alpha.mu=0, alpha.V=1000)))
>
> nsamp <- 10000
> THIN <- 900
> BURNIN <- 10000
> NITT <- BURNIN + THIN*nsamp

####  <<INSERT>>  ####
# You may want to consider a larger thinning interval (and for trial runs of the model, nsamp~1000 will suffice)
## I can't tell you an exact number, but it could be THIN <- 5000 or 10000
# These specifications also partly depend on the size of your dataset and the number of levels of fixed
## and random effects and the distribution of 0s/1s across these; which we cannot get from `head()`

> model3 = MCMCglmm(survival~
>                     Site*b*c,
>                   random=~x+Field+Field_block,
>                   data=dset,
>                     slice=TRUE,
>                     pl=T,
>                     prior=prior.c,
>                     family="categorical",verbose=FALSE,
>                     nitt=NITT,burnin=BURNIN,thin=THIN)
>
> Survival is a binary value of 0 or 1 and is observed only once per
> experimental plant.
> Therefore the observation-level variance R is fixed to 1. (As in the linked
> example.)
>
> Site, b, and c are two-level categorical variables. x is crossed with Field
> and Field_block, but Field_block is nested within Field.
>
> Models are run for each species separately.
>
> My questions are:
>
> a) Many worked examples which I based my own analysis on use the
> Gelman-Rubin
> criterion where you check the convergence of your model by running it a
> number of times and then compare models.
>
> However, I think the MCMCglmm vignette said to start the model running with
> overdispersed priors which is definitely not an option for me with the kind
> of data I have.
>
> I have tried using the testing for the Gelman-Rubin criterion nonetheless,
> but the Gelman diagnostic plots do not show a oscillating line that finally
> converges on a value but
> rather clines and straigt lines.

####  <<INSERT>>  ####
# I'm not sure to what part of the MCMCglmm vignette you are referring, but to use this
## Approach with multiple models, you will need to specify starting values.
## Otherwise, each model might proceed along the same chain (if I'm not mistaken) and you just get
## n of the same model/posterior
# To specify starting values:

# I can't remember if it is necessary, but it would be good to set the random seed for each model
m <- 1  #<-- model=1
set.seed(101+m)
#XXX Make sure matches prior in structure
# nfx <- number of liabilities
startN <- list(liab = rnorm(nfx, 0, 12),   # Need to specify the liabilities  
        B = list(V=diag(20,k), mu=rep(0,k)),
	R = list(R1 = rIW(diag(1), 15, fix = 1)), # where I have 15, tailor to the values you are getting
	G = list(G1 = rIW(diag(1), 15),
		G2 = rIW(diag(1), 15),
		G3 = rIW(diag(1), 15)))
# Then in MCMCglmm()
## MCMCglmm(...., start = startN)

>
> b) I am also not quite sure, if the value R is fixed at is appropiate for
> all models I run. For some
> models, I still get latent variable values bigger than 20, even at very
> high numbers of iterations.
>

####  <<INSERT>>  ####
# fixing R=1 is fairly standard for family="categorical".

> c) How do you decide to use family="categorical" (=logit link) or "ordinal"
> (=probit link)?
> Based on the DIC of the models?
>
> d) For many of my models, the explained variance for the random effects
> Field and Field_block are very high; sometimes reaching an upper estimate
> of 99%.
> I think the problem is that Field_block is not only nested in Field but
> that Field is also
> nested in the categorical fixed effect Site.
> Is my model overparametrized with regard to Field, since I have nearly
> complete survival in one of the two levels of Site?

####  <<INSERT>>  ####
# You might need to make sure the model you specify is converging OK, before
## deciding about your model specification based on the parameters it has estimated

>
> Kind regards,
> Jasmin
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models