[R-sig-ME] Model is nearly unidentifiable with glmer

Tue Feb 25 11:48:11 CET 2014

Dear Martin,

On Mon, 24-02-2014, at 19:11, maechler at stat.math.ethz.ch wrote:
>>>>>> Ramon Diaz-Uriarte <rdiaz02 at gmail.com>
>>>>>>     on Mon, 24 Feb 2014 16:46:06 +0100 writes:
>
>     > I think I figured out what was happening. This is the summary, in case it
>     > helps anyone else in the future.
>
> Thank you, Ramon,
> for this summary even though nobody seemed to have helped you.
>
> I think we lme4 developers should be quite happy if you could
> provide us with a reproducible example -- off-public if you
> want.
>
> If your real data is somewhat confidential,
> I think you could use
>
> new.y <- simulate(<fitted glmer>)
>
> and add new.y to your data.frame and subtract the original y
> (and then check that 'new.y' also shows the problem).
> We'd be happy to receive the 
>       save(<data-frame>, file = "RDU-glmer-ex.rda")
> data together with an R script using glmer() and producing the
> warnings you've been seeing.

I've left at

http://ligarto.org/rdiaz/nearly-unidentifiable.tar.gz

a small (230 KB) file with:

- data
- R code to show the problem
- two runs of that code on different machines

Best,

R.

>
> Thank you in advance.
> Martin Maechler, ETH Zurich
>
>     > The problem seems to be caused by no variation at the "dataSetID" (the
>     > random effect) level in some experimental level combinations. 
>
>
>     > If I fit the model to some subsets of the among-dataSetID combinations
>     > (i.e., exactly the same design, except with fewer values for some factors),
>     > I can get a "singular fit" warning and 0 for the variance of the random
>     > effect (and ranef which are all zero). Sure enough, at least in some cases,
>     > changing just a single value of the response gets rid of the warning (and
>     > leads to non-zero variance of the random effect).
>
>
>     > Now, when I use larger models that are a superset of the above singular fit
>     > models, I get the "nearly unidentifiable" problem, even when the output
>     > from the fit does not show 0 for the variance of the random effects (and
>     > the warning will disappear if we change just a few values of the response).
>
>
>     > As originally thought, the warning is not caused by some inherent problem
>     > in the model matrix ---i.e., in the design---.  But I could not identify
>     > the problem because I was making wrong assumptions when seeing that the
>     > random effects in the large models did not show 0 variance.
>
>
>
>     > Best,
>
>
>     > R.
>
>
>  
>
>     > On Wed, 19-02-2014, at 13:40, rdiaz02 at gmail.com wrote:
>     >> Dear All,
>     >> 
>     >> Summary:
>     >> ========
>     >> 
>     >> I am running some models with glmer that are giving, among others, the
>     >> warning message
>     >> 
>     >> In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv,  :
>     >> Model is nearly unidentifiable: very large eigenvalue
>     >> - Rescale variables?
>     >> 
>     >> 
>     >> but I cannot understand why the model is nearly unidentifable, based both
>     >> on the details of the experimental design, and by comparison with other
>     >> approaches (that run without complains and provide similar results).
>     >> Moreover, I have no idea what I am supposed to rescale.
>     >> 
>     >> 
>     >> 
>     >> 
>     >> Details:
>     >> ========
>     >> 
>     >> The call to glmer
>     >> -----------------
>     >> 
>     >> gt.0 <- glmer(nS ~ Tree + Model + sh + S.Type + S.Time + S.Size + Method +
>     >> (1|dataSetID) + (1|obs), 
>     >> family = poisson,
>     >> data = NP.all,
>     >> control = glmerControl(
>     >> check.conv.singular="warning",
>     >> optCtrl = list(maxfun = 10000))
>     >> )
>     >> 
>     >> The model is for Poisson data, allowing for overdispersion (with the obs
>     >> variable) and with a "dataSetID" random effect. (The glmerControl has been
>     >> added to try to understand what might be happening).
>     >> 
>     >> 
>     >> 
>     >> The design
>     >> ----------
>     >> 
>     >> The data come from a simulation experiment. For each combination of
>     >> 
>     >> Tree * Model * sh * S.Type * S.Time * S.Size there are 20 replicate
>     >> simulations (each identified by a dataSetID), so a full factorial
>     >> design. Each variable is a factor variable (6 levels for Tree, 4 for Model,
>     >> 2 for sh, 3 for S.Type, 2 for S.Time, 3 for S.Size)
>     >> 
>     >> Each replicate simulation, identified by a "dataSetID", is subject to four
>     >> Methods (a factor). That is why dataSetID is a random effect.
>     >> 
>     >> There are no missing values.
>     >> 
>     >> Thus, there are 17280 dataSetID groups (6*4*2*3*2*3*20), each with four
>     >> observations, resulting in a total of 69120 observations. glmer does
>     >> report these values just fine.
>     >> 
>     >> Therefore, based on that design, I think I should be able to fit not just
>     >> the model above, but a model with all possible interactions. In fact, that
>     >> model (a saturated model in the log-linear parlance, IIUC) is what I want
>     >> to start from.
>     >> 
>     >> 
>     >> The warnings
>     >> ------------
>     >> 
>     >> glmer gives two warnings:
>     >> 1: In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv,  :
>     >> Model failed to converge with max|grad| = 392.339 (tol = 0.001)
>     >> 2: In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv,  :
>     >> Model is nearly unidentifiable: very large eigenvalue
>     >> - Rescale variables?
>     >> 
>     >> 
>     >> The failure to converge I don't like, but OK. However, the "nearly
>     >> unidentifiable" I just don't understand. And what variables am I supposed
>     >> to rescale?
>     >> 
>     >> 
>     >> Removing the random effect for obs or for dataSetID does not eliminate the
>     >> warnings (and, in fact, that should not be the problem, since both obs and
>     >> dataSetID get non-zero estimates).
>     >> 
>     >> 
>     >> Other methods
>     >> -------------
>     >> 
>     >> I have fitted the above model with:
>     >> 
>     >> - MCMCglmm (using the priors list(R = list(V = 1, nu = 0.002), G = list(G1
>     >> = list(V = 1, nu = 0.002))))
>     >> 
>     >> 
>     >> - bglmer (with all as per default)
>     >> 
>     >> 
>     >> - glmer2stan (here using a more complex model with a bunch of interactions)
>     >> 
>     >> - glmmadmb (with family "nbinom")
>     >> 
>     >> and even if glmmadmb does complain about lack of convergence, there are no
>     >> further problems.
>     >> 
>     >> Moreover, with MCMCglmm I've fitted the model with interactions. Again, no
>     >> apparent problems (beyond slow mixing in some cases).
>     >> 
>     >> 
>     >> In addition, I have also fitted several other models that include all
>     >> possible interactions, just to make sure again I do not have something
>     >> silly in the model matrix. In some cases, the response is different (as
>     >> when fitting a lm/lmer). Beyond the lack of convergence of lmer fit, no
>     >> other problems.
>     >> 
>     >> 
>     >> ## a GLM with interactions but, of course, without the dataSetID random
>     >> effect.
>     >> 
>     >> glm(nS ~ Tree * Model * sh *
>     >> S.Type * S.Time * S.Size * Method,
>     >> data = NP.all,
>     >> family = poisson
>     >> )
>     >> 
>     >> ## a linear model with interactions 			     
>     >> 
>     >> lm(Dissim ~ Tree * Model * sh *
>     >> S.Type * S.Time * S.Size * Method,
>     >> data = NP.all
>     >> )
>     >> 
>     >> ## a lmers
>     >> 
>     >> lmer(Dissim ~ Tree * Model * sh *
>     >> S.Type * S.Time * S.Size * Method +
>     >> (1|dataSetID),
>     >> data = NP.all,
>     >> control = lmerControl(
>     >> check.conv.singular="warning",
>     >> optCtrl = list(maxfun = 10000))
>     >> )
>     >> ## This complaints of lack of convergence, but no identifiability
>     >> problems. 
>     >> 
>     >> 
>     >> 
>     >> (Lack of) Differences between fits
>     >> -----------------------------------
>     >> 
>     >> If anyone wants I can provide, of course, the output from the
>     >> fits. Anyway, the qualitative summary is that the the estimates for the
>     >> fixed are very similar between glmer, admb, bglmer, and MCMCglmm and
>     >> the estimates for the random effects are very similar for MCMCglmm,
>     >> bglmer, and glmer.
>     >> 
>     >> 
>     >> 
>     >> 
>     >> 
>     >> So, what am I missing?
>     >> 
>     >> 
>     >> Best,
>     >> 
>     >> 
>     >> 
>     >> R.
>
>     > -- 
>     > Ramon Diaz-Uriarte
>     > Department of Biochemistry, Lab B-25
>     > Facultad de Medicina 
>     > Universidad Autónoma de Madrid 
>     > Arzobispo Morcillo, 4
>     > 28029 Madrid
>     > Spain
>
>     > Phone: +34-91-497-2412
>
>     > Email: rdiaz02 at gmail.com
>     > ramon.diaz at iib.uam.es
>
>     > http://ligarto.org/rdiaz
>
>     > _______________________________________________
>     > R-sig-mixed-models at r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina 
Universidad Autónoma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz