[R-sig-ME] Model is nearly unidentifiable with glmer

Mon Feb 24 19:11:10 CET 2014

>>>>> Ramon Diaz-Uriarte <rdiaz02 at gmail.com>
>>>>>     on Mon, 24 Feb 2014 16:46:06 +0100 writes:

    > I think I figured out what was happening. This is the summary, in case it
    > helps anyone else in the future.

Thank you, Ramon,
for this summary even though nobody seemed to have helped you.

I think we lme4 developers should be quite happy if you could
provide us with a reproducible example -- off-public if you
want.

If your real data is somewhat confidential,
I think you could use

new.y <- simulate(<fitted glmer>)

and add new.y to your data.frame and subtract the original y
(and then check that 'new.y' also shows the problem).
We'd be happy to receive the 
      save(<data-frame>, file = "RDU-glmer-ex.rda")
data together with an R script using glmer() and producing the
warnings you've been seeing.

Thank you in advance.
Martin Maechler, ETH Zurich

    > The problem seems to be caused by no variation at the "dataSetID" (the
    > random effect) level in some experimental level combinations. 

    > If I fit the model to some subsets of the among-dataSetID combinations
    > (i.e., exactly the same design, except with fewer values for some factors),
    > I can get a "singular fit" warning and 0 for the variance of the random
    > effect (and ranef which are all zero). Sure enough, at least in some cases,
    > changing just a single value of the response gets rid of the warning (and
    > leads to non-zero variance of the random effect).

    > Now, when I use larger models that are a superset of the above singular fit
    > models, I get the "nearly unidentifiable" problem, even when the output
    > from the fit does not show 0 for the variance of the random effects (and
    > the warning will disappear if we change just a few values of the response).

    > As originally thought, the warning is not caused by some inherent problem
    > in the model matrix ---i.e., in the design---.  But I could not identify
    > the problem because I was making wrong assumptions when seeing that the
    > random effects in the large models did not show 0 variance.

    > Best,

    > R.

    > On Wed, 19-02-2014, at 13:40, rdiaz02 at gmail.com wrote:
    >> Dear All,
    >> 
    >> Summary:
    >> ========
    >> 
    >> I am running some models with glmer that are giving, among others, the
    >> warning message
    >> 
    >> In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv,  :
    >> Model is nearly unidentifiable: very large eigenvalue
    >> - Rescale variables?
    >> 
    >> 
    >> but I cannot understand why the model is nearly unidentifable, based both
    >> on the details of the experimental design, and by comparison with other
    >> approaches (that run without complains and provide similar results).
    >> Moreover, I have no idea what I am supposed to rescale.
    >> 
    >> 
    >> 
    >> 
    >> Details:
    >> ========
    >> 
    >> The call to glmer
    >> -----------------
    >> 
    >> gt.0 <- glmer(nS ~ Tree + Model + sh + S.Type + S.Time + S.Size + Method +
    >> (1|dataSetID) + (1|obs), 
    >> family = poisson,
    >> data = NP.all,
    >> control = glmerControl(
    >> check.conv.singular="warning",
    >> optCtrl = list(maxfun = 10000))
    >> )
    >> 
    >> The model is for Poisson data, allowing for overdispersion (with the obs
    >> variable) and with a "dataSetID" random effect. (The glmerControl has been
    >> added to try to understand what might be happening).
    >> 
    >> 
    >> 
    >> The design
    >> ----------
    >> 
    >> The data come from a simulation experiment. For each combination of
    >> 
    >> Tree * Model * sh * S.Type * S.Time * S.Size there are 20 replicate
    >> simulations (each identified by a dataSetID), so a full factorial
    >> design. Each variable is a factor variable (6 levels for Tree, 4 for Model,
    >> 2 for sh, 3 for S.Type, 2 for S.Time, 3 for S.Size)
    >> 
    >> Each replicate simulation, identified by a "dataSetID", is subject to four
    >> Methods (a factor). That is why dataSetID is a random effect.
    >> 
    >> There are no missing values.
    >> 
    >> Thus, there are 17280 dataSetID groups (6*4*2*3*2*3*20), each with four
    >> observations, resulting in a total of 69120 observations. glmer does
    >> report these values just fine.
    >> 
    >> Therefore, based on that design, I think I should be able to fit not just
    >> the model above, but a model with all possible interactions. In fact, that
    >> model (a saturated model in the log-linear parlance, IIUC) is what I want
    >> to start from.
    >> 
    >> 
    >> The warnings
    >> ------------
    >> 
    >> glmer gives two warnings:
    >> 1: In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv,  :
    >> Model failed to converge with max|grad| = 392.339 (tol = 0.001)
    >> 2: In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv,  :
    >> Model is nearly unidentifiable: very large eigenvalue
    >> - Rescale variables?
    >> 
    >> 
    >> The failure to converge I don't like, but OK. However, the "nearly
    >> unidentifiable" I just don't understand. And what variables am I supposed
    >> to rescale?
    >> 
    >> 
    >> Removing the random effect for obs or for dataSetID does not eliminate the
    >> warnings (and, in fact, that should not be the problem, since both obs and
    >> dataSetID get non-zero estimates).
    >> 
    >> 
    >> Other methods
    >> -------------
    >> 
    >> I have fitted the above model with:
    >> 
    >> - MCMCglmm (using the priors list(R = list(V = 1, nu = 0.002), G = list(G1
    >> = list(V = 1, nu = 0.002))))
    >> 
    >> 
    >> - bglmer (with all as per default)
    >> 
    >> 
    >> - glmer2stan (here using a more complex model with a bunch of interactions)
    >> 
    >> - glmmadmb (with family "nbinom")
    >> 
    >> and even if glmmadmb does complain about lack of convergence, there are no
    >> further problems.
    >> 
    >> Moreover, with MCMCglmm I've fitted the model with interactions. Again, no
    >> apparent problems (beyond slow mixing in some cases).
    >> 
    >> 
    >> In addition, I have also fitted several other models that include all
    >> possible interactions, just to make sure again I do not have something
    >> silly in the model matrix. In some cases, the response is different (as
    >> when fitting a lm/lmer). Beyond the lack of convergence of lmer fit, no
    >> other problems.
    >> 
    >> 
    >> ## a GLM with interactions but, of course, without the dataSetID random
    >> effect.
    >> 
    >> glm(nS ~ Tree * Model * sh *
    >> S.Type * S.Time * S.Size * Method,
    >> data = NP.all,
    >> family = poisson
    >> )
    >> 
    >> ## a linear model with interactions 			     
    >> 
    >> lm(Dissim ~ Tree * Model * sh *
    >> S.Type * S.Time * S.Size * Method,
    >> data = NP.all
    >> )
    >> 
    >> ## a lmers
    >> 
    >> lmer(Dissim ~ Tree * Model * sh *
    >> S.Type * S.Time * S.Size * Method +
    >> (1|dataSetID),
    >> data = NP.all,
    >> control = lmerControl(
    >> check.conv.singular="warning",
    >> optCtrl = list(maxfun = 10000))
    >> )
    >> ## This complaints of lack of convergence, but no identifiability
    >> problems. 
    >> 
    >> 
    >> 
    >> (Lack of) Differences between fits
    >> -----------------------------------
    >> 
    >> If anyone wants I can provide, of course, the output from the
    >> fits. Anyway, the qualitative summary is that the the estimates for the
    >> fixed are very similar between glmer, admb, bglmer, and MCMCglmm and
    >> the estimates for the random effects are very similar for MCMCglmm,
    >> bglmer, and glmer.
    >> 
    >> 
    >> 
    >> 
    >> 
    >> So, what am I missing?
    >> 
    >> 
    >> Best,
    >> 
    >> 
    >> 
    >> R.

    > -- 
    > Ramon Diaz-Uriarte
    > Department of Biochemistry, Lab B-25
    > Facultad de Medicina 
    > Universidad Autónoma de Madrid 
    > Arzobispo Morcillo, 4
    > 28029 Madrid
    > Spain

    > Phone: +34-91-497-2412

    > Email: rdiaz02 at gmail.com
    > ramon.diaz at iib.uam.es

    > http://ligarto.org/rdiaz

    > _______________________________________________
    > R-sig-mixed-models at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models