[R-sig-ME] Model selection GLM vs. GLMMs

Mon Oct 6 18:23:20 CEST 2014

On Sun, Oct 5, 2014 at 2:32 PM, Ben Bolker <bbolker at gmail.com> wrote:

> Nelida Villaseñor <nvillasenor at ...> writes:
>
> >
> > Thanks Ben.
> >
> > On: "I think you would find a bit of disagreement among experts about
> > the best procedure -- whether it would be to drop random effects until
> > you got a sensible non-singular fit, or to keep them in even though
> > they're singular"
> >
> > Could you please suggest me some references supporting each of these
> procedures?
> >
> > Cheers,
> > Nelida.
> >
>
>   I believe Barr et al (2013) has become the canonical reference
> for "put everything in the model".  Doug Bates has mentioned to
> me that he was working on a response, with some co-authors,
> that disagrees with Barr et al.; perhaps he will chime in on
> its status.

I have both practical and theoretical problems with the advice to "keep it
maximal". These problems are related to the inevitable overparameterization
of the model.  Two examples of fits of such models using the MixedModels
package for Julia are shown at the end of

http://nbviewer.ipython.org/github/dmbates/slides/blob/master/2014-10-01-Ithaca/lmm%20Examples.ipynb

In the first case, "bs10" - the data from Barr & Seyfeddinipur (2011),
there are 12 distinct items and 10 of the 20 nonlinear covariance
parameters in the model are associated with the random effects for item.
In the second case, "kb07" - the data from Kronmüller & Barr (2007), there
are 56 subjects and 32 items with 72 covariance matrix parameters overall,
36 for subject and 36 for item.  The data for each of these examples was
kindly provided by Dale Barr.

>From a display of standard deviations for the random effects and their
estimated correlations it is difficult to see that the estimated covariance
matrices are singular. However, if you look at the Cholesky factor, or a
singular value decomposition of the Cholesky factor, it is obvious that
these matrices are singular.

The practical difficulties are that there are so many highly nonlinear
parameters in these models that convergence to an optimum is difficult.
The log-likelihood function, even after profiling out the residual variance
and the fixed-effects parameters is very flat and the model will converge
on the boundary of the parameter space (the parameters on the diagonal of
the Cholesky factor are constrained to be non-negative and several of them
are exactly zero or very close to zero).  This is not unexpected - you can
tell before you start that this is going to happen because you are trying
to estimate so many highly nonlinear parameters from so few distinct levels
of the grouping factor.

In the Julia package I was able to code up the gradient of the
log-likelihood for these kinds of models, which helps considerably with the
reliability and the speed of convergence.  Even so, the second example
takes about 5 minutes to converge.  When I tried to fit this model in R
with lme4 it took a couple of hours and I needed to increase the number of
iterations allowed to 20,000.

So the question is, do you really expect to gain more information by
fitting a model that you can tell will be singular and that will take a
very long time to converge because of this singularity?  I can't see what
is gained.  Barr et al. have done simulations from which they conclude that
this is important but the model used in the simulations is not like
anything I have encountered in practice.  As a statistician I feel that
hypothesis tests should be done according to the "heredity principle" (i.e.
don't test for a main effect in the presence of a non-trivial interaction
involving that factor) unless there is a clear reason for violating the
principle.  As I understand the argument for "maximal" structure in the
model it says to put every possible interaction into the model before
testing for a main effect - advice that I find peculiar.  I don't know how
to make sense of the results of such a test.

> I have a book chapter in press
> (Fox, G. Negrete-Yankelevich, S. and Sosa, V. (Eds.) Ecological Statistics:
> from principles to application. Oxford University Press) that discusses
> this issue.
>
>   Suggestions from the rest of the community are welcome, as always.
>
>   Ben Bolker
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

	[[alternative HTML version deleted]]