[R-sig-ME] Model is nearly unidentifiable with glmer
Ramon Diaz-Uriarte
rdiaz02 at gmail.com
Tue Feb 25 11:48:11 CET 2014
Dear Martin,
On Mon, 24-02-2014, at 19:11, maechler at stat.math.ethz.ch wrote:
>>>>>> Ramon Diaz-Uriarte <rdiaz02 at gmail.com>
>>>>>> on Mon, 24 Feb 2014 16:46:06 +0100 writes:
>
> > I think I figured out what was happening. This is the summary, in case it
> > helps anyone else in the future.
>
> Thank you, Ramon,
> for this summary even though nobody seemed to have helped you.
>
> I think we lme4 developers should be quite happy if you could
> provide us with a reproducible example -- off-public if you
> want.
>
> If your real data is somewhat confidential,
> I think you could use
>
> new.y <- simulate(<fitted glmer>)
>
> and add new.y to your data.frame and subtract the original y
> (and then check that 'new.y' also shows the problem).
> We'd be happy to receive the
> save(<data-frame>, file = "RDU-glmer-ex.rda")
> data together with an R script using glmer() and producing the
> warnings you've been seeing.
I've left at
http://ligarto.org/rdiaz/nearly-unidentifiable.tar.gz
a small (230 KB) file with:
- data
- R code to show the problem
- two runs of that code on different machines
Best,
R.
>
> Thank you in advance.
> Martin Maechler, ETH Zurich
>
> > The problem seems to be caused by no variation at the "dataSetID" (the
> > random effect) level in some experimental level combinations.
>
>
> > If I fit the model to some subsets of the among-dataSetID combinations
> > (i.e., exactly the same design, except with fewer values for some factors),
> > I can get a "singular fit" warning and 0 for the variance of the random
> > effect (and ranef which are all zero). Sure enough, at least in some cases,
> > changing just a single value of the response gets rid of the warning (and
> > leads to non-zero variance of the random effect).
>
>
> > Now, when I use larger models that are a superset of the above singular fit
> > models, I get the "nearly unidentifiable" problem, even when the output
> > from the fit does not show 0 for the variance of the random effects (and
> > the warning will disappear if we change just a few values of the response).
>
>
> > As originally thought, the warning is not caused by some inherent problem
> > in the model matrix ---i.e., in the design---. But I could not identify
> > the problem because I was making wrong assumptions when seeing that the
> > random effects in the large models did not show 0 variance.
>
>
>
> > Best,
>
>
> > R.
>
>
>
>
> > On Wed, 19-02-2014, at 13:40, rdiaz02 at gmail.com wrote:
> >> Dear All,
> >>
> >> Summary:
> >> ========
> >>
> >> I am running some models with glmer that are giving, among others, the
> >> warning message
> >>
> >> In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv, :
> >> Model is nearly unidentifiable: very large eigenvalue
> >> - Rescale variables?
> >>
> >>
> >> but I cannot understand why the model is nearly unidentifable, based both
> >> on the details of the experimental design, and by comparison with other
> >> approaches (that run without complains and provide similar results).
> >> Moreover, I have no idea what I am supposed to rescale.
> >>
> >>
> >>
> >>
> >> Details:
> >> ========
> >>
> >> The call to glmer
> >> -----------------
> >>
> >> gt.0 <- glmer(nS ~ Tree + Model + sh + S.Type + S.Time + S.Size + Method +
> >> (1|dataSetID) + (1|obs),
> >> family = poisson,
> >> data = NP.all,
> >> control = glmerControl(
> >> check.conv.singular="warning",
> >> optCtrl = list(maxfun = 10000))
> >> )
> >>
> >> The model is for Poisson data, allowing for overdispersion (with the obs
> >> variable) and with a "dataSetID" random effect. (The glmerControl has been
> >> added to try to understand what might be happening).
> >>
> >>
> >>
> >> The design
> >> ----------
> >>
> >> The data come from a simulation experiment. For each combination of
> >>
> >> Tree * Model * sh * S.Type * S.Time * S.Size there are 20 replicate
> >> simulations (each identified by a dataSetID), so a full factorial
> >> design. Each variable is a factor variable (6 levels for Tree, 4 for Model,
> >> 2 for sh, 3 for S.Type, 2 for S.Time, 3 for S.Size)
> >>
> >> Each replicate simulation, identified by a "dataSetID", is subject to four
> >> Methods (a factor). That is why dataSetID is a random effect.
> >>
> >> There are no missing values.
> >>
> >> Thus, there are 17280 dataSetID groups (6*4*2*3*2*3*20), each with four
> >> observations, resulting in a total of 69120 observations. glmer does
> >> report these values just fine.
> >>
> >> Therefore, based on that design, I think I should be able to fit not just
> >> the model above, but a model with all possible interactions. In fact, that
> >> model (a saturated model in the log-linear parlance, IIUC) is what I want
> >> to start from.
> >>
> >>
> >> The warnings
> >> ------------
> >>
> >> glmer gives two warnings:
> >> 1: In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv, :
> >> Model failed to converge with max|grad| = 392.339 (tol = 0.001)
> >> 2: In checkConv(attr(opt, "derivs"), opt$par, checkCtrl = control$checkConv, :
> >> Model is nearly unidentifiable: very large eigenvalue
> >> - Rescale variables?
> >>
> >>
> >> The failure to converge I don't like, but OK. However, the "nearly
> >> unidentifiable" I just don't understand. And what variables am I supposed
> >> to rescale?
> >>
> >>
> >> Removing the random effect for obs or for dataSetID does not eliminate the
> >> warnings (and, in fact, that should not be the problem, since both obs and
> >> dataSetID get non-zero estimates).
> >>
> >>
> >> Other methods
> >> -------------
> >>
> >> I have fitted the above model with:
> >>
> >> - MCMCglmm (using the priors list(R = list(V = 1, nu = 0.002), G = list(G1
> >> = list(V = 1, nu = 0.002))))
> >>
> >>
> >> - bglmer (with all as per default)
> >>
> >>
> >> - glmer2stan (here using a more complex model with a bunch of interactions)
> >>
> >> - glmmadmb (with family "nbinom")
> >>
> >> and even if glmmadmb does complain about lack of convergence, there are no
> >> further problems.
> >>
> >> Moreover, with MCMCglmm I've fitted the model with interactions. Again, no
> >> apparent problems (beyond slow mixing in some cases).
> >>
> >>
> >> In addition, I have also fitted several other models that include all
> >> possible interactions, just to make sure again I do not have something
> >> silly in the model matrix. In some cases, the response is different (as
> >> when fitting a lm/lmer). Beyond the lack of convergence of lmer fit, no
> >> other problems.
> >>
> >>
> >> ## a GLM with interactions but, of course, without the dataSetID random
> >> effect.
> >>
> >> glm(nS ~ Tree * Model * sh *
> >> S.Type * S.Time * S.Size * Method,
> >> data = NP.all,
> >> family = poisson
> >> )
> >>
> >> ## a linear model with interactions
> >>
> >> lm(Dissim ~ Tree * Model * sh *
> >> S.Type * S.Time * S.Size * Method,
> >> data = NP.all
> >> )
> >>
> >> ## a lmers
> >>
> >> lmer(Dissim ~ Tree * Model * sh *
> >> S.Type * S.Time * S.Size * Method +
> >> (1|dataSetID),
> >> data = NP.all,
> >> control = lmerControl(
> >> check.conv.singular="warning",
> >> optCtrl = list(maxfun = 10000))
> >> )
> >> ## This complaints of lack of convergence, but no identifiability
> >> problems.
> >>
> >>
> >>
> >> (Lack of) Differences between fits
> >> -----------------------------------
> >>
> >> If anyone wants I can provide, of course, the output from the
> >> fits. Anyway, the qualitative summary is that the the estimates for the
> >> fixed are very similar between glmer, admb, bglmer, and MCMCglmm and
> >> the estimates for the random effects are very similar for MCMCglmm,
> >> bglmer, and glmer.
> >>
> >>
> >>
> >>
> >>
> >> So, what am I missing?
> >>
> >>
> >> Best,
> >>
> >>
> >>
> >> R.
>
> > --
> > Ramon Diaz-Uriarte
> > Department of Biochemistry, Lab B-25
> > Facultad de Medicina
> > Universidad Autónoma de Madrid
> > Arzobispo Morcillo, 4
> > 28029 Madrid
> > Spain
>
> > Phone: +34-91-497-2412
>
> > Email: rdiaz02 at gmail.com
> > ramon.diaz at iib.uam.es
>
> > http://ligarto.org/rdiaz
>
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: rdiaz02 at gmail.com
ramon.diaz at iib.uam.es
http://ligarto.org/rdiaz
More information about the R-sig-mixed-models
mailing list