[R-sig-ME] addressing singularity in lme4 fits caused by subsets of contrasts

Thu Jan 21 23:01:58 CET 2021

   This is an interesting question ("interesting" means among other 
things "I don't know").

   If you get a variance estimate of zero for the second contrast then 
removing that term from the model should (I think) give you **exactly 
the same** model results (as an analogy: suppose you had the mean model 
y = a +b*x+c*z and for some reason got an estimate of c=0, then you said 
"can I drop z from the model?")

   More generally, in order to know whether this is OK you have to 
define what "OK" means.  Trying to avoid philosophical or subjective 
statements, you could ask whether following this process gives 'good' 
results (unbiased and/or low-error estimates and good coverage of 
whichever set of parameters you're interested in). In particular, if 
you're interested in inference on fixed effects only, then I'd say you 
can do anything to the random effects component of the model as long as 
it doesn't mess up your estimation and inference on the fixed effects.

   You could try some simulations to test your idea (note that your 
conclusions can only be for the range of parameters you've actually 
simulated: in particular Bates et al 2015 criticize the realism of the 
simulations from Barr et al 2013 "keep it maximal":

"First, the simulations implement a factorial contrast that is 
atypically large compared to what is found in natural data.  Second, and 
more importantly, the correlations in the random effects structure range 
from−0.8 to +0.8.  Such large correlation parameters are indicative of 
overparameterization.They hardly ever represent true correlations in the 
population.  As a consequence, these simulations  do  not  provide  a 
solid  foundation  for  recommendations  about  how  to  fit 
mixed-effects models to empirical data."

Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 
“Parsimonious Mixed Models.” ArXiv:1506.04967 [Stat], June 16, 2015. 
http://arxiv.org/abs/1506.04967.

On 1/21/21 11:54 AM, Nathan Tardiff wrote:
> I have encountered an issue a couple times recently when fitting models in
> lme4 that I have not seen addressed in commonly cited papers for dealing w/
> boundary/singular fit issues.
> 
> Say I have a categorical variable representing a set of within-subject
> contrasts, which is entered into the model as a set of effect coded
> variables, e.g.
> 
> df$congruent.f <- factor(df$congruent,levels=c(1,0,-1),
>                              labels=c("congruent","incongruent","none"))
> contrasts(df$congruent.f) <- contr.sum(3)
> 
> which will produce two variables in the model (e.g. congruent.f1,
> congruent.f2). When I fit the model w/ random intercepts and slopes for
> these contrasts (along w/ other control variables), I get a boundary
> (singular) fit warning.
> 
> Examining the correlations and variance components suggests that the
> primary cause of the warning is in one of these contrast variables (e.g.
> congruent.f2). So, would it ever be acceptable in this scenario to remove
> the random effect term ONLY for congruent.f2, not the entire set of
> congruent.f contrasts, where the goal is statistical inference and I do not
> want the p-values/confidence intervals for congruent.f1 to be
> anticonservative when it does in fact show variance across subjects?
> 
> I have to this point assumed that this would be a bad idea and tried to
> simplify such models in other ways (i.e. setting correlations to zero or
> removing other random effects), but this does not always work and seems a
> roundabout method if you are not dealing with the primary problem.
> 
> Thanks,
> Nathan
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>