[R-sig-ME] Random effect variance = zero

Thu Aug 14 17:12:43 CEST 2014

On Thu, Aug 14, 2014 at 5:54 AM, Marco Plebani <marcoplebani85 at gmail.com>
wrote:

> Dear list members,
>
> Package blme has been suggested for fixing issues with random effect
> variance = zero in other occasions, but I do not understand the rationale
> behind it. What does blme that lme4 does not? In which way do the two
> approaches differ? In particular:
>

I appreciate your asking these questions.  Personally, I don't regard
random-effect variances being estimated as zero as an issue that needs to
be "fixed".  To me this is a situation where the data indicate that the
model should be simplified, in the sense that the random effects term can
be removed without substantial loss of fidelity to the data.

The maximum likelihood or the REML criterion can be viewed as a trade-off
between model simplicity and fidelity to the data.  In cases where the
model converges to boundary conditions, such as a variance component
estimate of zero, the simpler model does not fit significantly worse than
the model with the random effects present.

>From the context of the data we may "know" (or at least expect) that there
is variability between the groups that we should take into account.  An
estimate of zero does not indicate that there is no variability in the
groups; it indicates that there is not excess variability beyond what would
be induced by the residual, or per-observation, variability.  If you
simulate 50 observations, say from a Gaussian distribution, and arbitrarily
divide them into 10 groups of 5 there will be variability in the means of
these 10 groups, even though there was no explicit group-to-group
variability added to the simulation.

An estimate of zero for a variance component is a property of the model,
not of the underlying mechanism generating the data.  Remember George Box's
famous statement that "All models are wrong; some models are useful."

This is why I feel uncomfortable with assigning a prior that will have the
effect of pulling the estimate of a variance component away from zero. To
me, this is overruling the data.  If the data do not contain sufficient
information to distinguish between the model fits with and without the
random effects then to me that indicates that you should report it as such.
This doesn't mean that you have affirmed the null hypothesis of the
between-group variance being zero.  It is much more likely that there is
insufficient data to estimate the parameters in a model of this level of
complexity.  Don't confuse absence of evidence with evidence of absence.

Estimation of variance and covariance components requires a large number of
groups.  It is important to realize this.  It is also important to realize
that in most cases you are not terribly interested in precise estimates of
variance components.  Sometimes you are but a substantial portion of the
time you are using  random effects to model subject-to-subject variability,
etc. and if the data don't provide sufficient subject-to-subject
variability to support the model then drop down to a simpler model.  This
works in the case of a zero variance component; other cases with variances
and covariances in which the covariance matrix has a singular estimate are
more difficult to handle.

In a Bayesian framework the choice of prior can allow you to pull the
parameter estimates away from uncomfortable values.  But it is this choice
that I find uncomfortable.  Suppose I analyze data using one prior and
reach some conclusions and then you analyze the same data with a different
choice of prior and reach other conclusions.  Are our conclusions based on
the data or on our prior beliefs?

In most cases this doesn't happen.  If the likelihood is much less diffuse
than the prior then the posterior distribution is dominated by the data,
not the prior. But it is exactly in the boundary cases that the likelihood
is very diffuse and the information is coming from the prior, not the data.
 To me, this is a red flag.  Assigning a prior to pull back parameter
estimates from problematic values is, in my opinion, overruling the data.

I feel that the choice of prior should be justified on grounds other than
"it gives me the results that I want".  That is too harsh a criticism - no
reputable investigator would do such a thing on purpose but they may do so
by accident.  As with many concepts in statistics, the mathematics to
investigate the properties of priors is subtle and difficult.  Box and Tiao
in their book "Bayesian Inference in Statistical Analysis" appeal to the
concept of "data translated likelihood" to justify a locally uniform prior
on the logarithm of a variance.  This means that the prior pushes the
estimate of a standard deviation or variance towards zero, not away from
zero.

I do admit that I haven't kept up with the literature on Bayesian inference
so there may be better justifications for prior distributions on variance
components and covariance matrices for random effects.  I do think,
however, that there should be some justification outside the context of the
data for a choice of prior, especially in cases where the prior dominates
the likelihood.  In practice this means the cases where the estimates are
on the boundary or where the information on the variance components is very
diffuse.  Unfortunately, those cases are more common than we would like.
You must have a large number of groups before you can hope to have
precision on the estimate of a single variance component.  You must have a
very large number of groups before you can hope for precision of an
estimate of a covariance matrix for random effects.

- what is the prior information that blme is using, and
> - how comes that blme still estimates parameter values and assign p-values
> to them? According to my (very limited) knowledge of bayesian stats the
> outcome of the analysis should be an updated distribution of the possible
> parameter values.
>
> The available documentation about blme is limited and/or I could not find
> it. I realize that my question on blme hides another, much broader, on how
> bayesian stats work; regarding the latter, a suggestion of a good,
> practice-oriented reference book would be appreciated.
>
> Thank you in advance,
>
> Marco
>
> -----
> Marco Plebani, PhD candidate (Ecology) at the University of Zurich
> Institute of Evolutionary Biology and Environmental Studies
> http://www.ieu.uzh.ch/staff/phd/plebani.html
>
> On 13/ago/2014, at 12:00, r-sig-mixed-models-request at r-project.org wrote:
>
> > Date: Tue, 12 Aug 2014 12:35:10 -0400
> > Subject: Re: [R-sig-ME] Random effect variance = zero
> > From: bbolker at gmail.com
> > To: aurorepaligot at hotmail.com
> > CC: r-sig-mixed-models at r-project.org
> >
> >
> > Short answer: yes, very common outcome, especially with small numbers of
> random effects groups (e.g. <5).  See http://glmm.wikidot.com/faq ; blme
> package for 'regularizing' fits so this doesn't happen (at the expense of
> changing the statistical model slightly); http://rpubs.com/bbolker/4187 .
> >
> >
> >
> > On Tue, Aug 12, 2014 at 12:05 PM, Aurore Paligot <
> aurorepaligot at hotmail.com> wrote:
> >
> > Hello Everybody, I am new at using mixed models, and I would like some
> advice about some results that I obtained and that seem counter-intuitive
> to me.  As an output of a test, I obtainded a variance of zero for a random
> factor.
> >
> > […] How is it possible?  Can it be considered as a reasonable output?
> >
> > I found this information about the variance estimates of zero. Could
> this explanation apply to my study?
> >
> > "It is possible to end up with a school variance estimate of zero. This
> fact often puzzles the researcher since each school will most certainly not
> have the same mean test result. An estimated among-school variance being
> zero, however, does not mean that each school has the same mean, but rather
> that the clustering of the students within schools does not help explain
> any of the overall variability present in test results. In this case, test
> results of students can be considered as all independent of each other
> regardless if they are from the same school or not. "(
> http://www.cscu.cornell.edu/news/statnews/stnews69.pdf )
> >
> > If not, where could the problem come from? Is the formula that I used
> correct? Is a mixed-model appropriate for this type of question?
> >
> > I would really appreciate some clarification if someone already faced
> this type of problem !
> >
> > Best regards,
> >
> > Aurore
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

	[[alternative HTML version deleted]]