[R-sig-ME] interpreting significance from lmer results for dummies (like me)

Sat Apr 26 09:59:21 CEST 2008

Hi Mark,

On Fri, Apr 25, 2008 at 11:53:24PM -0400, Mark Kimpel wrote:
> I am a bioinformatistician, with my strongest background in molecular
> biology. I have been trying to learn about mixed-effects to improve the
> analysis of my experiments, which certainly contain random effects. I will
> admit to being totally lost in the discussions regarding lack of p-value
> reporting in the current versions of lmer. Furthermore, I suspect those that
> need to publish to non-statistical journals will face reviewers who are
> equally in the dark. Where can I find a biologist-level explanation of the
> current controversy, 

I'll take a stab.

1) the traditional, Fisher-style test of a null hypothesis is based on
   computing the probability of observing a test statistic as extreme
   or more extreme than the one actually observed, assuming that the
   null hypothesis is true.  This probability is called the p-value.
   If the p-value is less than some cut-off, e.g. 0.01, then the null
   hypothesis is rejected.

2) in order to compute that p-value, we need to know the cumulative
   distribution function of the test statistic when the null
   hypothesis is true. In simple cases this is easy: for example, we
   use the t-distribution for the comparison of two normal means (with
   assumed equal variances etc).

3) in (many) hierarchical models the cumulative distribution function
   of the test statistic when the null hypothesis is true is simply not
   known.  So, we can't compute the p-value.  

3a) in a limited range of hierarchical models that have historically
    dominated analysis of variance, e.g. split-plot designs, the
    reference distribution is known (it's F).  

3b) Numerous experts have (quite reasonably) built up a bulwark of
    intuitive knowledge about the analysis of such designs.

3c) the intuition does not necessarily pertain to the analysis of any
    arbitrary hierarchical design, which might be unbalanced, and have
    crossed random effects.  That is, the intuition might be applied,
    but inappropriately.

4) in any case, the distribution that is intuitively or otherwise
    assumed is the F, because it works in the cases mentioned in 3a.
    All that remains is to define the degrees of freedom.  The
    numerator degrees of freedom are obvious, but the denominator
    degrees of freedom are not known.

4a) numerous other packages supply approximations to the denominator
    degrees of freedom, eg Satterthwaite, and KR (which is related).
    They have been subjected to a modest degree of scrutiny by
    simulation.

5) however, it is not clear that the reference distribution is really
   F at all, and therefore it is not clear that correcting the
   denominator degrees of freedom is what is needed.  Confusion reigns
   on how the p-values should be computed.  And because of this
   confusion, Doug Bates declines to provide p-values.

> how can I learn how to properly judge significance from my lmer
> results,

There are numerous approximations, but no way to properly judge
significance as far as I am aware.  Try the R-wiki for algorithms, and
be conservative.  

http://wiki.r-project.org/rwiki/doku.php

Or, use lme, report the p-values computed therein, and be aware that
they are not necessarily telling you exactly what you want to know.

> and what peer-reviewed references can I steer reviewers
> towards?

Not sure about that one.  I'm working on some simulations with Doug
but it's slow going, mainly because I'm chronically disorganised.

> I understand, from other threads, that some believe a paradigm shift
> away from p-values may be necessary, but I it is not clear to me
> what paradigm will replace this entrenced view. I can appreciate the
> fact that there may be conflicting opinions about the best
> equations/algorithms for determining significance, but is there any
> agreement on the goal we are heading towards?

The conflict is not about p-values per se, but about the way that they
are calculated.  I would bet that the joint goal is to find an
algorithm that provides robust, reasonable inference in a sufficiently
wide variety of cases that its implementation proves to be worthwhile.

I hope that this was helpful.

Andrew

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/