[R-sig-ME] Model is nearly unidentifiable with glmer

Mon Feb 24 16:46:06 CET 2014

I think I figured out what was happening. This is the summary, in case it
helps anyone else in the future.

The problem seems to be caused by no variation at the "dataSetID" (the
random effect) level in some experimental level combinations. 

If I fit the model to some subsets of the among-dataSetID combinations
(i.e., exactly the same design, except with fewer values for some factors),
I can get a "singular fit" warning and 0 for the variance of the random
effect (and ranef which are all zero). Sure enough, at least in some cases,
changing just a single value of the response gets rid of the warning (and
leads to non-zero variance of the random effect).

Now, when I use larger models that are a superset of the above singular fit
models, I get the "nearly unidentifiable" problem, even when the output
from the fit does not show 0 for the variance of the random effects (and
the warning will disappear if we change just a few values of the response).

As originally thought, the warning is not caused by some inherent problem
in the model matrix ---i.e., in the design---.  But I could not identify
the problem because I was making wrong assumptions when seeing that the
random effects in the large models did not show 0 variance.

Best,

R.

On Wed, 19-02-2014, at 13:40, rdiaz02 at gmail.com wrote:
> Dear All,
>
> Summary:
> ========
>
> I am running some models with glmer that are giving, among others, the
> warning message
>
> In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv,  :
>   Model is nearly unidentifiable: very large eigenvalue
>  - Rescale variables?
>
>
> but I cannot understand why the model is nearly unidentifable, based both
> on the details of the experimental design, and by comparison with other
> approaches (that run without complains and provide similar results).
> Moreover, I have no idea what I am supposed to rescale.
>
>
>
>
> Details:
> ========
>  
> The call to glmer
> -----------------
>
> gt.0 <- glmer(nS ~ Tree + Model + sh + S.Type + S.Time + S.Size + Method +
>                             (1|dataSetID) + (1|obs), 
>                             family = poisson,
>                             data = NP.all,
>                             control = glmerControl(
>                                 check.conv.singular="warning",
>                                 optCtrl = list(maxfun = 10000))
>  				)
>
> The model is for Poisson data, allowing for overdispersion (with the obs
> variable) and with a "dataSetID" random effect. (The glmerControl has been
> added to try to understand what might be happening).
>
>
>
> The design
> ----------
>
> The data come from a simulation experiment. For each combination of
>
> Tree * Model * sh * S.Type * S.Time * S.Size there are 20 replicate
> simulations (each identified by a dataSetID), so a full factorial
> design. Each variable is a factor variable (6 levels for Tree, 4 for Model,
> 2 for sh, 3 for S.Type, 2 for S.Time, 3 for S.Size)
>
> Each replicate simulation, identified by a "dataSetID", is subject to four
> Methods (a factor). That is why dataSetID is a random effect.
>
> There are no missing values.
>
> Thus, there are 17280 dataSetID groups (6*4*2*3*2*3*20), each with four
> observations, resulting in a total of 69120 observations. glmer does
> report these values just fine.
>
> Therefore, based on that design, I think I should be able to fit not just
> the model above, but a model with all possible interactions. In fact, that
> model (a saturated model in the log-linear parlance, IIUC) is what I want
> to start from.
>
>
> The warnings
> ------------
>
> glmer gives two warnings:
> 1: In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv,  :
>   Model failed to converge with max|grad| = 392.339 (tol = 0.001)
> 2: In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv,  :
>   Model is nearly unidentifiable: very large eigenvalue
>  - Rescale variables?
>
>
> The failure to converge I don't like, but OK. However, the "nearly
> unidentifiable" I just don't understand. And what variables am I supposed
> to rescale?
>
>
> Removing the random effect for obs or for dataSetID does not eliminate the
> warnings (and, in fact, that should not be the problem, since both obs and
> dataSetID get non-zero estimates).
>
>  
> Other methods
> -------------
>
> I have fitted the above model with:
>
> - MCMCglmm (using the priors list(R = list(V = 1, nu = 0.002), G = list(G1
>   = list(V = 1, nu = 0.002))))
>
>
> - bglmer (with all as per default)
>
>
> - glmer2stan (here using a more complex model with a bunch of interactions)
>
> - glmmadmb (with family "nbinom")
>
> and even if glmmadmb does complain about lack of convergence, there are no
> further problems.
>
> Moreover, with MCMCglmm I've fitted the model with interactions. Again, no
> apparent problems (beyond slow mixing in some cases).
>
>
> In addition, I have also fitted several other models that include all
> possible interactions, just to make sure again I do not have something
> silly in the model matrix. In some cases, the response is different (as
> when fitting a lm/lmer). Beyond the lack of convergence of lmer fit, no
> other problems.
>
>
> ## a GLM with interactions but, of course, without the dataSetID random
>    effect.
>
> glm(nS ~ Tree * Model * sh *
>                              S.Type * S.Time * S.Size * Method,
>                              data = NP.all,
>                              family = poisson
>                              )
>
> ## a linear model with interactions 			     
> 			     
> lm(Dissim ~ Tree * Model * sh *
>                              S.Type * S.Time * S.Size * Method,
>                              data = NP.all
>                              )
>  
> ## a lmers
>
> lmer(Dissim ~ Tree * Model * sh *
>                              S.Type * S.Time * S.Size * Method +
>                              (1|dataSetID),
>                              data = NP.all,
>                              control = lmerControl(
>                                  check.conv.singular="warning",
>                                  optCtrl = list(maxfun = 10000))
>                              )
> ## This complaints of lack of convergence, but no identifiability
>                              problems. 
> 			     
>
> 			     
> (Lack of) Differences between fits
> -----------------------------------
>
> If anyone wants I can provide, of course, the output from the
> fits. Anyway, the qualitative summary is that the the estimates for the
> fixed are very similar between glmer, admb, bglmer, and MCMCglmm and
> the estimates for the random effects are very similar for MCMCglmm,
> bglmer, and glmer.
>
>
>
>
>
> So, what am I missing?
>
>
> Best,
>
>
>
> R.

-- 
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina 
Universidad Autónoma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz