[R-sig-ME] Interpretation of odd error variances in glmer

Thu Nov 19 16:03:54 CET 2009

On Thu, Nov 19, 2009 at 8:30 AM, Daniel Ezra Johnson
<danielezrajohnson at gmail.com> wrote:
> On Thu, Nov 19, 2009 at 9:24 AM, Brendan Halpin <brendan.halpin at ul.ie> wrote:
>>
>> On Thu, Nov 19 2009, Douglas Bates wrote:
>>
>>> It is quite legitimate for the ML estimates of a variance component to
>>> be zero.  It simply means that there is not enough variability
>>> accounted for by that term to warrant incorporating the term.
>>
>> Thanks for the swift response.
>>
>> Given that the estimate seems to be exactly zero, can I read your answer
>> to say that the term has somehow been dropped by the algorithm?
>
> For all intents and purposes, yes.

Indeed.  Saying that "the term has somehow been dropped by the
algorithm" implies that the model structure has been altered for a
special case.  That is not quite what is going on here.  The
likelihood is being maximized with respect to parameters that
correspond to the standard deviations of the random effects, which
must be greater than or equal to zero, and the fixed-effects
parameters.  This is a constrained optimization problem and in this
case convergence is on the boundary of the parameter region.

The likelihood for such models can be considered as balancing fidelity
to the data versus complexity of the model.  The simplest model is the
one that eliminates the random effects by setting all the standard
deviations to zero.  Generally the quality of the fit from such a
model is much worse than for models that allow non-zero standard
deviations.  In this case the improvement in the quality of the fit by
allowing a non-zero standard deviation for the random effect for this
term is not sufficient to overcome the increase in model complexity,
as measure by the likelihood.

On a technical note, it took me a very long time to come up with a
formulation of the linear mixed-effects and generalized linear
mixed-effects model that allows for evaluation of the likelihood so
that these parameters go to zero smoothly.  Evaluation of the
log-likelihood for such models is almost universally described in
terms of the inverse of the variance-covariance matrix for the random
effects and that causes problems when you get zeros on the diagonal.
It is also a tricky part of trying to do Markov chain Monte Carlo
methods or similar calculations.  I haven't examined Jarrod's code for
MCMCglmm to see how that situation is handled there but I know that I
found it very difficult to work around zero or near-zero standard
deviations.

>> I've investigated a bit more and find that the zero variance seems to
>> occur with the combination of the inclusion of a department-level
>> covariate (depfemr), and the cross-classifying individual-level random
>> effect (ulid).

Understandable.  If the explanatory power of these random effects can
be expressed, at least partially, by other terms then the advantages
of having the random effects in the model are diminished and the
balance tips in favor of the model without these effects.

> I wasn't quite sure of the structure of the data here, but it raised a
> question for me. I understand that when a random effect is fully
> nested within a fixed effect, the penalty on the random effect
> resolves the singularity and allows estimation of both. (That is, if
> appropriate, you could model depfemr as a fixed effect?)
>
> But if/when two random effects are fully nested, as is frequently
> modeled and I think is the case here, how does the algorithm know how
> to assign the variance as between e.g. depfemr and ulid?
>
> Dan
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>