[R-sig-ME] Model is nearly unidentifiable with glmer
Martin Maechler
maechler at stat.math.ethz.ch
Mon Feb 24 19:11:10 CET 2014
>>>>> Ramon Diaz-Uriarte <rdiaz02 at gmail.com>
>>>>> on Mon, 24 Feb 2014 16:46:06 +0100 writes:
> I think I figured out what was happening. This is the summary, in case it
> helps anyone else in the future.
Thank you, Ramon,
for this summary even though nobody seemed to have helped you.
I think we lme4 developers should be quite happy if you could
provide us with a reproducible example -- off-public if you
want.
If your real data is somewhat confidential,
I think you could use
new.y <- simulate(<fitted glmer>)
and add new.y to your data.frame and subtract the original y
(and then check that 'new.y' also shows the problem).
We'd be happy to receive the
save(<data-frame>, file = "RDU-glmer-ex.rda")
data together with an R script using glmer() and producing the
warnings you've been seeing.
Thank you in advance.
Martin Maechler, ETH Zurich
> The problem seems to be caused by no variation at the "dataSetID" (the
> random effect) level in some experimental level combinations.
> If I fit the model to some subsets of the among-dataSetID combinations
> (i.e., exactly the same design, except with fewer values for some factors),
> I can get a "singular fit" warning and 0 for the variance of the random
> effect (and ranef which are all zero). Sure enough, at least in some cases,
> changing just a single value of the response gets rid of the warning (and
> leads to non-zero variance of the random effect).
> Now, when I use larger models that are a superset of the above singular fit
> models, I get the "nearly unidentifiable" problem, even when the output
> from the fit does not show 0 for the variance of the random effects (and
> the warning will disappear if we change just a few values of the response).
> As originally thought, the warning is not caused by some inherent problem
> in the model matrix ---i.e., in the design---. But I could not identify
> the problem because I was making wrong assumptions when seeing that the
> random effects in the large models did not show 0 variance.
> Best,
> R.
> On Wed, 19-02-2014, at 13:40, rdiaz02 at gmail.com wrote:
>> Dear All,
>>
>> Summary:
>> ========
>>
>> I am running some models with glmer that are giving, among others, the
>> warning message
>>
>> In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv, :
>> Model is nearly unidentifiable: very large eigenvalue
>> - Rescale variables?
>>
>>
>> but I cannot understand why the model is nearly unidentifable, based both
>> on the details of the experimental design, and by comparison with other
>> approaches (that run without complains and provide similar results).
>> Moreover, I have no idea what I am supposed to rescale.
>>
>>
>>
>>
>> Details:
>> ========
>>
>> The call to glmer
>> -----------------
>>
>> gt.0 <- glmer(nS ~ Tree + Model + sh + S.Type + S.Time + S.Size + Method +
>> (1|dataSetID) + (1|obs),
>> family = poisson,
>> data = NP.all,
>> control = glmerControl(
>> check.conv.singular="warning",
>> optCtrl = list(maxfun = 10000))
>> )
>>
>> The model is for Poisson data, allowing for overdispersion (with the obs
>> variable) and with a "dataSetID" random effect. (The glmerControl has been
>> added to try to understand what might be happening).
>>
>>
>>
>> The design
>> ----------
>>
>> The data come from a simulation experiment. For each combination of
>>
>> Tree * Model * sh * S.Type * S.Time * S.Size there are 20 replicate
>> simulations (each identified by a dataSetID), so a full factorial
>> design. Each variable is a factor variable (6 levels for Tree, 4 for Model,
>> 2 for sh, 3 for S.Type, 2 for S.Time, 3 for S.Size)
>>
>> Each replicate simulation, identified by a "dataSetID", is subject to four
>> Methods (a factor). That is why dataSetID is a random effect.
>>
>> There are no missing values.
>>
>> Thus, there are 17280 dataSetID groups (6*4*2*3*2*3*20), each with four
>> observations, resulting in a total of 69120 observations. glmer does
>> report these values just fine.
>>
>> Therefore, based on that design, I think I should be able to fit not just
>> the model above, but a model with all possible interactions. In fact, that
>> model (a saturated model in the log-linear parlance, IIUC) is what I want
>> to start from.
>>
>>
>> The warnings
>> ------------
>>
>> glmer gives two warnings:
>> 1: In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv, :
>> Model failed to converge with max|grad| = 392.339 (tol = 0.001)
>> 2: In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv, :
>> Model is nearly unidentifiable: very large eigenvalue
>> - Rescale variables?
>>
>>
>> The failure to converge I don't like, but OK. However, the "nearly
>> unidentifiable" I just don't understand. And what variables am I supposed
>> to rescale?
>>
>>
>> Removing the random effect for obs or for dataSetID does not eliminate the
>> warnings (and, in fact, that should not be the problem, since both obs and
>> dataSetID get non-zero estimates).
>>
>>
>> Other methods
>> -------------
>>
>> I have fitted the above model with:
>>
>> - MCMCglmm (using the priors list(R = list(V = 1, nu = 0.002), G = list(G1
>> = list(V = 1, nu = 0.002))))
>>
>>
>> - bglmer (with all as per default)
>>
>>
>> - glmer2stan (here using a more complex model with a bunch of interactions)
>>
>> - glmmadmb (with family "nbinom")
>>
>> and even if glmmadmb does complain about lack of convergence, there are no
>> further problems.
>>
>> Moreover, with MCMCglmm I've fitted the model with interactions. Again, no
>> apparent problems (beyond slow mixing in some cases).
>>
>>
>> In addition, I have also fitted several other models that include all
>> possible interactions, just to make sure again I do not have something
>> silly in the model matrix. In some cases, the response is different (as
>> when fitting a lm/lmer). Beyond the lack of convergence of lmer fit, no
>> other problems.
>>
>>
>> ## a GLM with interactions but, of course, without the dataSetID random
>> effect.
>>
>> glm(nS ~ Tree * Model * sh *
>> S.Type * S.Time * S.Size * Method,
>> data = NP.all,
>> family = poisson
>> )
>>
>> ## a linear model with interactions
>>
>> lm(Dissim ~ Tree * Model * sh *
>> S.Type * S.Time * S.Size * Method,
>> data = NP.all
>> )
>>
>> ## a lmers
>>
>> lmer(Dissim ~ Tree * Model * sh *
>> S.Type * S.Time * S.Size * Method +
>> (1|dataSetID),
>> data = NP.all,
>> control = lmerControl(
>> check.conv.singular="warning",
>> optCtrl = list(maxfun = 10000))
>> )
>> ## This complaints of lack of convergence, but no identifiability
>> problems.
>>
>>
>>
>> (Lack of) Differences between fits
>> -----------------------------------
>>
>> If anyone wants I can provide, of course, the output from the
>> fits. Anyway, the qualitative summary is that the the estimates for the
>> fixed are very similar between glmer, admb, bglmer, and MCMCglmm and
>> the estimates for the random effects are very similar for MCMCglmm,
>> bglmer, and glmer.
>>
>>
>>
>>
>>
>> So, what am I missing?
>>
>>
>> Best,
>>
>>
>>
>> R.
> --
> Ramon Diaz-Uriarte
> Department of Biochemistry, Lab B-25
> Facultad de Medicina
> Universidad Autónoma de Madrid
> Arzobispo Morcillo, 4
> 28029 Madrid
> Spain
> Phone: +34-91-497-2412
> Email: rdiaz02 at gmail.com
> ramon.diaz at iib.uam.es
> http://ligarto.org/rdiaz
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list