[R-sig-ME] interpreting significance from lmer results for dummies (like me)

Mon Apr 28 11:02:58 CEST 2008

Thanks John - very interesting stuff!

Andrew

On Sun, Apr 27, 2008 at 05:27:50PM +1000, John Maindonald wrote:
> On Behrens-Fisher; see the Wikepedia article "Unsolved problems in  
> statistics",
> and references that are given there.
> 
> Linnik, Jurii (1968). Statistical Problems with Nuisance Parameters.
> American Mathematical Society. ISBN 0821815709.
> 
> The link given to the
> 
> My reading of the literature is that this is not so much an unsolved
> problem as one that has, within the p-value paradigm,  no satisfactory
> theoretical solution.   I take this to be a problem with the p-value
> paradigm.  For Bayesians there is no problem; all inference is
> conditional on one or other choice of prior.  The various frequentist
> "solutions" to the Behrens-Fisher problem involve one or other
> more limited form of conditioning.
> 
> Others may be able to pronounce more authoritatively.
> 
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> 
> 
> On 27 Apr 2008, at 2:06 PM, Andrew Robinson wrote:
> 
> >Hi John,
> >
> >On Sun, Apr 27, 2008 at 01:47:40PM +1000, John Maindonald wrote:
> >>HI Andrew -
> >>Maybe you'd like to send this to the list.  I'd meant to send my
> >>reply to the list.
> >
> >Ok, cc-ing the list.
> >
> >>My understanding is that, in the usual definition, designs are
> >>hierarchical when they have a hierarchical error structure,
> >>irrespective of whether the fixed effects are hierarchical. From
> >>the Wikipedia article on hierachical linear models:
> >>"Multilevel analysis allows variance in outcome variables to be
> >>analysed at multiple hierarchical levels, whereas in simple linear
> >>and multiple linear regression all effects are modeled to occur at
> >>a single level. Thus, HLM is appropriate for use with nested data."
> >>
> >>This is different from the database notion of a hierarchical data
> >>structure.
> >
> >Sure, I agree.  But I don't think that this definition excludes
> >designs with crossed random effects from being hierarchical.
> >
> >>The Behrens-Fisher type issue (really only an issue for one small
> >>relevant df, when the bounds for the p-value can be very wide)
> >>makes p-values, in general, somewhat fraught in the mult-level
> >>modeling context.  To get unique p-values, one has to make some
> >>assumption about bounds on the relevant relative variances.  In
> >>part for this sort of reason, I do not really care whether what comes
> >>from mcmcsamp() closely resembles one or other choice of
> >>p-value.
> >
> >Can you point me to a citation for that issue?
> >
> >Cheers
> >
> >Andrew
> >
> >
> >
> >
> >>Cheers -
> >>John.
> >>
> >>John Maindonald             email: john.maindonald at anu.edu.au
> >>phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> >>Centre for Mathematics & Its Applications, Room 1194,
> >>John Dedman Mathematical Sciences Building (Building 27)
> >>Australian National University, Canberra ACT 0200.
> >>
> >>
> >>On 27 Apr 2008, at 12:54 PM, Andrew Robinson wrote:
> >>
> >>>Thanks John - and I have interpolated replies.
> >>>
> >>>On Sun, Apr 27, 2008 at 12:04:52PM +1000, John Maindonald wrote:
> >>>>I have interpolated a few comments.
> >>>>
> >>>>On 26 Apr 2008, at 5:59 PM, Andrew Robinson wrote:
> >>>>
> >>>>>Hi Mark,
> >>>>>
> >>>>>On Fri, Apr 25, 2008 at 11:53:24PM -0400, Mark Kimpel wrote:
> >>>>>>I am a bioinformatistician, with my strongest background in
> >>>>>>molecular
> >>>>>>biology. I have been trying to learn about mixed-effects to  
> >>>>>>improve
> >>>>>>the
> >>>>>>analysis of my experiments, which certainly contain random  
> >>>>>>effects.
> >>>>>>I will
> >>>>>>admit to being totally lost in the discussions regarding lack  
> >>>>>>of p-
> >>>>>>value
> >>>>>>reporting in the current versions of lmer. Furthermore, I suspect
> >>>>>>those that
> >>>>>>need to publish to non-statistical journals will face reviewers  
> >>>>>>who
> >>>>>>are
> >>>>>>equally in the dark. Where can I find a biologist-level  
> >>>>>>explanation
> >>>>>>of the
> >>>>>>current controversy,
> >>>>>
> >>>>>I'll take a stab.
> >>>>>
> >>>>>1) the traditional, Fisher-style test of a null hypothesis is
> >>>>>based on
> >>>>>computing the probability of observing a test statistic as extreme
> >>>>>or more extreme than the one actually observed, assuming that the
> >>>>>null hypothesis is true.  This probability is called the p-value.
> >>>>>If the p-value is less than some cut-off, e.g. 0.01, then the null
> >>>>>hypothesis is rejected.
> >>>>>
> >>>>>2) in order to compute that p-value, we need to know the  
> >>>>>cumulative
> >>>>>distribution function of the test statistic when the null
> >>>>>hypothesis is true. In simple cases this is easy: for example, we
> >>>>>use the t-distribution for the comparison of two normal means  
> >>>>>(with
> >>>>>assumed equal variances etc).
> >>>>>
> >>>>>3) in (many) hierarchical models the cumulative distribution
> >>>>>function
> >>>>>of the test statistic when the null hypothesis is true is simply  
> >>>>>not
> >>>>>known.  So, we can't compute the p-value.
> >>>>
> >>>>It can though be simulated.  If, though, the assumptions are
> >>>>seriously wrong, all bets are off.  In particular normality of
> >>>>effects seems to me a much more often serious issue than in most
> >>>>models with a simple independent and identically distributed error
> >>>>structure, just because there are commonly many fewer independent
> >>>>values at the level that matters for the intended  
> >>>>generalization(s).
> >>>
> >>>Agreed.  In fact, in a follow-up to the original questioner I also
> >>>mentioned the parametric bootstrap.  That doesn't help with the
> >>>assumptions, however.
> >>>
> >>>>Moreover one can get plausible information about the posterior
> >>>>distribution of the parameters from mcmcsamp(), even a Bayesian
> >>>>equivalent of a p-value.
> >>>
> >>>I'm not sure about the relaibility of those p-values yet.
> >>>
> >>>>>3a) in a limited range of hierarchical models that have  
> >>>>>historically
> >>>>>dominated analysis of variance, e.g. split-plot designs, the
> >>>>>reference distribution is known (it's F).
> >>>>
> >>>>Or known more or less well.
> >>>
> >>>Yes.
> >>>
> >>>>>3b) Numerous experts have (quite reasonably) built up a bulwark of
> >>>>>intuitive knowledge about the analysis of such designs.
> >>>>>
> >>>>>3c) the intuition does not necessarily pertain to the analysis of
> >>>>>any
> >>>>>arbitrary hierarchical design, which might be unbalanced, and have
> >>>>>crossed random effects.  That is, the intuition might be applied,
> >>>>>but inappropriately.
> >>>>
> >>>>If it has crossed random effects, it is not hierarchical.  This is
> >>>>where
> >>>>the approximations are most likely to break down.
> >>>
> >>>I think that it's still hierarchical with cross effects.  We still
> >>>have observations nested within levels of random effects.
> >>>
> >>>>>4) in any case, the distribution that is intuitively or otherwise
> >>>>>assumed is the F, because it works in the cases mentioned in 3a.
> >>>>>All that remains is to define the degrees of freedom.  The
> >>>>>numerator degrees of freedom are obvious, but the denominator
> >>>>>degrees of freedom are not known.
> >>>>>
> >>>>>4a) numerous other packages supply approximations to the  
> >>>>>denominator
> >>>>>degrees of freedom, eg Satterthwaite, and KR (which is related).
> >>>>>They have been subjected to a modest degree of scrutiny by
> >>>>>simulation.
> >>>>>
> >>>>>5) however, it is not clear that the reference distribution is
> >>>>>really
> >>>>>F at all, and therefore it is not clear that correcting the
> >>>>>denominator degrees of freedom is what is needed.  Confusion  
> >>>>>reigns
> >>>>>on how the p-values should be computed.  And because of this
> >>>>>confusion, Doug Bates declines to provide p-values.
> >>>>
> >>>>Simulation can provide a definitive answer, if one believes the
> >>>>model.
> >>>>I do not think that Doug has any fundamental objection to giving  
> >>>>the
> >>>>KR approximation.  He has, reasonably, had other priorities.
> >>>
> >>>Yes, these things are true.
> >>>
> >>>>Nor has anyone, so far, offered an algorithm that uses the lmer
> >>>>output
> >>>>structures to implement the KR approximation.
> >>>
> >>>Doug and I are looking at it, but it's slow going because he's busy
> >>>and I'm - well - slow.
> >>>
> >>>>>>how can I learn how to properly judge significance from my lmer
> >>>>>>results,
> >>>>>
> >>>>>There are numerous approximations, but no way to properly judge
> >>>>>significance as far as I am aware.  Try the R-wiki for algorithms,
> >>>>>and
> >>>>>be conservative.
> >>>>
> >>>>As indicated above, there are other mechanisms, safer in general  
> >>>>than
> >>>>using an approximation.  Users have to work at them though.
> >>>
> >>>I think that those are still approximations, but of a different  
> >>>order.
> >>>
> >>>>Note also that, when tests have a numerator that is a linear
> >>>>combination
> >>>>of variances, there can be issues of a Behrens-Fisher type (see,  
> >>>>e.g.
> >>>>the Wikepedia Behrens-Fisher article), where it is impossible to
> >>>>derive
> >>>>a p-value that is valid independent of the relative magnitudes of  
> >>>>the
> >>>>unknown variances. The Behrens-Fisher approximation is though,
> >>>>usually, a reasonable compromise.
> >>>
> >>>I didn't know about that; thanks.
> >>>
> >>>>>http://wiki.r-project.org/rwiki/doku.php
> >>>>>
> >>>>>Or, use lme, report the p-values computed therein, and be aware  
> >>>>>that
> >>>>>they are not necessarily telling you exactly what you want to  
> >>>>>know.
> >>>>>
> >>>>>>and what peer-reviewed references can I steer reviewers
> >>>>>>towards?
> >>>>>
> >>>>>Not sure about that one.  I'm working on some simulations with  
> >>>>>Doug
> >>>>>but it's slow going, mainly because I'm chronically disorganised.
> >>>>>
> >>>>>>I understand, from other threads, that some believe a paradigm
> >>>>>>shift
> >>>>>>away from p-values may be necessary, but I it is not clear to me
> >>>>>>what paradigm will replace this entrenced view. I can appreciate
> >>>>>>the
> >>>>>>fact that there may be conflicting opinions about the best
> >>>>>>equations/algorithms for determining significance, but is there  
> >>>>>>any
> >>>>>>agreement on the goal we are heading towards?
> >>>>>
> >>>>>The conflict is not about p-values per se, but about the way that
> >>>>>they
> >>>>>are calculated.  I would bet that the joint goal is to find an
> >>>>>algorithm that provides robust, reasonable inference in a
> >>>>>sufficiently
> >>>>>wide variety of cases that its implementation proves to be
> >>>>>worthwhile.
> >>>>
> >>>>There is an argument about p-values per se,  One should,  
> >>>>arguably, be
> >>>>examining the profile likelihood, or the whole of the posterior
> >>>>distribution.
> >>>
> >>>"arguably"  ... that is an argument!  :)
> >>>
> >>>Andrew
> >>>
> >>>-- 
> >>>Andrew Robinson
> >>>Department of Mathematics and Statistics            Tel:
> >>>+61-3-8344-6410
> >>>University of Melbourne, VIC 3010 Australia         Fax:
> >>>+61-3-8344-4599
> >>>http://www.ms.unimelb.edu.au/~andrewpr
> >>>http://blogs.mbs.edu/fishing-in-the-bay/
> >
> >-- 
> >Andrew Robinson
> >Department of Mathematics and Statistics            Tel:  
> >+61-3-8344-6410
> >University of Melbourne, VIC 3010 Australia         Fax:  
> >+61-3-8344-4599
> >http://www.ms.unimelb.edu.au/~andrewpr
> >http://blogs.mbs.edu/fishing-in-the-bay/
> >
> >_______________________________________________
> >R-sig-mixed-models at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/