[R-sig-ME] interpreting significance from lmer results for dummies (like me)

Sun Apr 27 06:06:23 CEST 2008

Hi John,

On Sun, Apr 27, 2008 at 01:47:40PM +1000, John Maindonald wrote:
> HI Andrew -
> Maybe you'd like to send this to the list.  I'd meant to send my
> reply to the list.

Ok, cc-ing the list.

> My understanding is that, in the usual definition, designs are
> hierarchical when they have a hierarchical error structure,
> irrespective of whether the fixed effects are hierarchical. From
> the Wikipedia article on hierachical linear models:
> "Multilevel analysis allows variance in outcome variables to be
> analysed at multiple hierarchical levels, whereas in simple linear
> and multiple linear regression all effects are modeled to occur at
> a single level. Thus, HLM is appropriate for use with nested data."
> 
> This is different from the database notion of a hierarchical data
> structure.

Sure, I agree.  But I don't think that this definition excludes
designs with crossed random effects from being hierarchical.

> The Behrens-Fisher type issue (really only an issue for one small
> relevant df, when the bounds for the p-value can be very wide)
> makes p-values, in general, somewhat fraught in the mult-level
> modeling context.  To get unique p-values, one has to make some
> assumption about bounds on the relevant relative variances.  In
> part for this sort of reason, I do not really care whether what comes
> from mcmcsamp() closely resembles one or other choice of
> p-value.

Can you point me to a citation for that issue?

Cheers

Andrew

> Cheers -
> John.
> 
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> 
> 
> On 27 Apr 2008, at 12:54 PM, Andrew Robinson wrote:
> 
> >Thanks John - and I have interpolated replies.
> >
> >On Sun, Apr 27, 2008 at 12:04:52PM +1000, John Maindonald wrote:
> >>I have interpolated a few comments.
> >>
> >>On 26 Apr 2008, at 5:59 PM, Andrew Robinson wrote:
> >>
> >>>Hi Mark,
> >>>
> >>>On Fri, Apr 25, 2008 at 11:53:24PM -0400, Mark Kimpel wrote:
> >>>>I am a bioinformatistician, with my strongest background in  
> >>>>molecular
> >>>>biology. I have been trying to learn about mixed-effects to improve
> >>>>the
> >>>>analysis of my experiments, which certainly contain random effects.
> >>>>I will
> >>>>admit to being totally lost in the discussions regarding lack of p-
> >>>>value
> >>>>reporting in the current versions of lmer. Furthermore, I suspect
> >>>>those that
> >>>>need to publish to non-statistical journals will face reviewers who
> >>>>are
> >>>>equally in the dark. Where can I find a biologist-level explanation
> >>>>of the
> >>>>current controversy,
> >>>
> >>>I'll take a stab.
> >>>
> >>>1) the traditional, Fisher-style test of a null hypothesis is  
> >>>based on
> >>>computing the probability of observing a test statistic as extreme
> >>>or more extreme than the one actually observed, assuming that the
> >>>null hypothesis is true.  This probability is called the p-value.
> >>>If the p-value is less than some cut-off, e.g. 0.01, then the null
> >>>hypothesis is rejected.
> >>>
> >>>2) in order to compute that p-value, we need to know the cumulative
> >>>distribution function of the test statistic when the null
> >>>hypothesis is true. In simple cases this is easy: for example, we
> >>>use the t-distribution for the comparison of two normal means (with
> >>>assumed equal variances etc).
> >>>
> >>>3) in (many) hierarchical models the cumulative distribution  
> >>>function
> >>>of the test statistic when the null hypothesis is true is simply not
> >>>known.  So, we can't compute the p-value.
> >>
> >>It can though be simulated.  If, though, the assumptions are
> >>seriously wrong, all bets are off.  In particular normality of
> >>effects seems to me a much more often serious issue than in most
> >>models with a simple independent and identically distributed error
> >>structure, just because there are commonly many fewer independent
> >>values at the level that matters for the intended generalization(s).
> >
> >Agreed.  In fact, in a follow-up to the original questioner I also
> >mentioned the parametric bootstrap.  That doesn't help with the
> >assumptions, however.
> >
> >>Moreover one can get plausible information about the posterior
> >>distribution of the parameters from mcmcsamp(), even a Bayesian
> >>equivalent of a p-value.
> >
> >I'm not sure about the relaibility of those p-values yet.
> >
> >>>3a) in a limited range of hierarchical models that have historically
> >>>dominated analysis of variance, e.g. split-plot designs, the
> >>>reference distribution is known (it's F).
> >>
> >>Or known more or less well.
> >
> >Yes.
> >
> >>>3b) Numerous experts have (quite reasonably) built up a bulwark of
> >>>intuitive knowledge about the analysis of such designs.
> >>>
> >>>3c) the intuition does not necessarily pertain to the analysis of  
> >>>any
> >>>arbitrary hierarchical design, which might be unbalanced, and have
> >>>crossed random effects.  That is, the intuition might be applied,
> >>>but inappropriately.
> >>
> >>If it has crossed random effects, it is not hierarchical.  This is  
> >>where
> >>the approximations are most likely to break down.
> >
> >I think that it's still hierarchical with cross effects.  We still
> >have observations nested within levels of random effects.
> >
> >>>4) in any case, the distribution that is intuitively or otherwise
> >>>assumed is the F, because it works in the cases mentioned in 3a.
> >>>All that remains is to define the degrees of freedom.  The
> >>>numerator degrees of freedom are obvious, but the denominator
> >>>degrees of freedom are not known.
> >>>
> >>>4a) numerous other packages supply approximations to the denominator
> >>>degrees of freedom, eg Satterthwaite, and KR (which is related).
> >>>They have been subjected to a modest degree of scrutiny by
> >>>simulation.
> >>>
> >>>5) however, it is not clear that the reference distribution is  
> >>>really
> >>>F at all, and therefore it is not clear that correcting the
> >>>denominator degrees of freedom is what is needed.  Confusion reigns
> >>>on how the p-values should be computed.  And because of this
> >>>confusion, Doug Bates declines to provide p-values.
> >>
> >>Simulation can provide a definitive answer, if one believes the  
> >>model.
> >>I do not think that Doug has any fundamental objection to giving the
> >>KR approximation.  He has, reasonably, had other priorities.
> >
> >Yes, these things are true.
> >
> >>Nor has anyone, so far, offered an algorithm that uses the lmer  
> >>output
> >>structures to implement the KR approximation.
> >
> >Doug and I are looking at it, but it's slow going because he's busy
> >and I'm - well - slow.
> >
> >>>>how can I learn how to properly judge significance from my lmer
> >>>>results,
> >>>
> >>>There are numerous approximations, but no way to properly judge
> >>>significance as far as I am aware.  Try the R-wiki for algorithms,  
> >>>and
> >>>be conservative.
> >>
> >>As indicated above, there are other mechanisms, safer in general than
> >>using an approximation.  Users have to work at them though.
> >
> >I think that those are still approximations, but of a different order.
> >
> >>Note also that, when tests have a numerator that is a linear  
> >>combination
> >>of variances, there can be issues of a Behrens-Fisher type (see, e.g.
> >>the Wikepedia Behrens-Fisher article), where it is impossible to  
> >>derive
> >>a p-value that is valid independent of the relative magnitudes of the
> >>unknown variances. The Behrens-Fisher approximation is though,
> >>usually, a reasonable compromise.
> >
> >I didn't know about that; thanks.
> >
> >>>http://wiki.r-project.org/rwiki/doku.php
> >>>
> >>>Or, use lme, report the p-values computed therein, and be aware that
> >>>they are not necessarily telling you exactly what you want to know.
> >>>
> >>>>and what peer-reviewed references can I steer reviewers
> >>>>towards?
> >>>
> >>>Not sure about that one.  I'm working on some simulations with Doug
> >>>but it's slow going, mainly because I'm chronically disorganised.
> >>>
> >>>>I understand, from other threads, that some believe a paradigm  
> >>>>shift
> >>>>away from p-values may be necessary, but I it is not clear to me
> >>>>what paradigm will replace this entrenced view. I can appreciate  
> >>>>the
> >>>>fact that there may be conflicting opinions about the best
> >>>>equations/algorithms for determining significance, but is there any
> >>>>agreement on the goal we are heading towards?
> >>>
> >>>The conflict is not about p-values per se, but about the way that  
> >>>they
> >>>are calculated.  I would bet that the joint goal is to find an
> >>>algorithm that provides robust, reasonable inference in a  
> >>>sufficiently
> >>>wide variety of cases that its implementation proves to be  
> >>>worthwhile.
> >>
> >>There is an argument about p-values per se,  One should, arguably, be
> >>examining the profile likelihood, or the whole of the posterior
> >>distribution.
> >
> >"arguably"  ... that is an argument!  :)
> >
> >Andrew
> >
> >-- 
> >Andrew Robinson
> >Department of Mathematics and Statistics            Tel:  
> >+61-3-8344-6410
> >University of Melbourne, VIC 3010 Australia         Fax:  
> >+61-3-8344-4599
> >http://www.ms.unimelb.edu.au/~andrewpr
> >http://blogs.mbs.edu/fishing-in-the-bay/

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/