[R-sig-ME] interpreting significance from lmer results for dummies (like me)
Andrew Robinson
A.Robinson at ms.unimelb.edu.au
Mon Apr 28 11:02:58 CEST 2008
Thanks John - very interesting stuff!
Andrew
On Sun, Apr 27, 2008 at 05:27:50PM +1000, John Maindonald wrote:
> On Behrens-Fisher; see the Wikepedia article "Unsolved problems in
> statistics",
> and references that are given there.
>
> Linnik, Jurii (1968). Statistical Problems with Nuisance Parameters.
> American Mathematical Society. ISBN 0821815709.
>
> The link given to the
>
> My reading of the literature is that this is not so much an unsolved
> problem as one that has, within the p-value paradigm, no satisfactory
> theoretical solution. I take this to be a problem with the p-value
> paradigm. For Bayesians there is no problem; all inference is
> conditional on one or other choice of prior. The various frequentist
> "solutions" to the Behrens-Fisher problem involve one or other
> more limited form of conditioning.
>
> Others may be able to pronounce more authoritatively.
>
> John Maindonald email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473 fax : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
>
>
> On 27 Apr 2008, at 2:06 PM, Andrew Robinson wrote:
>
> >Hi John,
> >
> >On Sun, Apr 27, 2008 at 01:47:40PM +1000, John Maindonald wrote:
> >>HI Andrew -
> >>Maybe you'd like to send this to the list. I'd meant to send my
> >>reply to the list.
> >
> >Ok, cc-ing the list.
> >
> >>My understanding is that, in the usual definition, designs are
> >>hierarchical when they have a hierarchical error structure,
> >>irrespective of whether the fixed effects are hierarchical. From
> >>the Wikipedia article on hierachical linear models:
> >>"Multilevel analysis allows variance in outcome variables to be
> >>analysed at multiple hierarchical levels, whereas in simple linear
> >>and multiple linear regression all effects are modeled to occur at
> >>a single level. Thus, HLM is appropriate for use with nested data."
> >>
> >>This is different from the database notion of a hierarchical data
> >>structure.
> >
> >Sure, I agree. But I don't think that this definition excludes
> >designs with crossed random effects from being hierarchical.
> >
> >>The Behrens-Fisher type issue (really only an issue for one small
> >>relevant df, when the bounds for the p-value can be very wide)
> >>makes p-values, in general, somewhat fraught in the mult-level
> >>modeling context. To get unique p-values, one has to make some
> >>assumption about bounds on the relevant relative variances. In
> >>part for this sort of reason, I do not really care whether what comes
> >>from mcmcsamp() closely resembles one or other choice of
> >>p-value.
> >
> >Can you point me to a citation for that issue?
> >
> >Cheers
> >
> >Andrew
> >
> >
> >
> >
> >>Cheers -
> >>John.
> >>
> >>John Maindonald email: john.maindonald at anu.edu.au
> >>phone : +61 2 (6125)3473 fax : +61 2(6125)5549
> >>Centre for Mathematics & Its Applications, Room 1194,
> >>John Dedman Mathematical Sciences Building (Building 27)
> >>Australian National University, Canberra ACT 0200.
> >>
> >>
> >>On 27 Apr 2008, at 12:54 PM, Andrew Robinson wrote:
> >>
> >>>Thanks John - and I have interpolated replies.
> >>>
> >>>On Sun, Apr 27, 2008 at 12:04:52PM +1000, John Maindonald wrote:
> >>>>I have interpolated a few comments.
> >>>>
> >>>>On 26 Apr 2008, at 5:59 PM, Andrew Robinson wrote:
> >>>>
> >>>>>Hi Mark,
> >>>>>
> >>>>>On Fri, Apr 25, 2008 at 11:53:24PM -0400, Mark Kimpel wrote:
> >>>>>>I am a bioinformatistician, with my strongest background in
> >>>>>>molecular
> >>>>>>biology. I have been trying to learn about mixed-effects to
> >>>>>>improve
> >>>>>>the
> >>>>>>analysis of my experiments, which certainly contain random
> >>>>>>effects.
> >>>>>>I will
> >>>>>>admit to being totally lost in the discussions regarding lack
> >>>>>>of p-
> >>>>>>value
> >>>>>>reporting in the current versions of lmer. Furthermore, I suspect
> >>>>>>those that
> >>>>>>need to publish to non-statistical journals will face reviewers
> >>>>>>who
> >>>>>>are
> >>>>>>equally in the dark. Where can I find a biologist-level
> >>>>>>explanation
> >>>>>>of the
> >>>>>>current controversy,
> >>>>>
> >>>>>I'll take a stab.
> >>>>>
> >>>>>1) the traditional, Fisher-style test of a null hypothesis is
> >>>>>based on
> >>>>>computing the probability of observing a test statistic as extreme
> >>>>>or more extreme than the one actually observed, assuming that the
> >>>>>null hypothesis is true. This probability is called the p-value.
> >>>>>If the p-value is less than some cut-off, e.g. 0.01, then the null
> >>>>>hypothesis is rejected.
> >>>>>
> >>>>>2) in order to compute that p-value, we need to know the
> >>>>>cumulative
> >>>>>distribution function of the test statistic when the null
> >>>>>hypothesis is true. In simple cases this is easy: for example, we
> >>>>>use the t-distribution for the comparison of two normal means
> >>>>>(with
> >>>>>assumed equal variances etc).
> >>>>>
> >>>>>3) in (many) hierarchical models the cumulative distribution
> >>>>>function
> >>>>>of the test statistic when the null hypothesis is true is simply
> >>>>>not
> >>>>>known. So, we can't compute the p-value.
> >>>>
> >>>>It can though be simulated. If, though, the assumptions are
> >>>>seriously wrong, all bets are off. In particular normality of
> >>>>effects seems to me a much more often serious issue than in most
> >>>>models with a simple independent and identically distributed error
> >>>>structure, just because there are commonly many fewer independent
> >>>>values at the level that matters for the intended
> >>>>generalization(s).
> >>>
> >>>Agreed. In fact, in a follow-up to the original questioner I also
> >>>mentioned the parametric bootstrap. That doesn't help with the
> >>>assumptions, however.
> >>>
> >>>>Moreover one can get plausible information about the posterior
> >>>>distribution of the parameters from mcmcsamp(), even a Bayesian
> >>>>equivalent of a p-value.
> >>>
> >>>I'm not sure about the relaibility of those p-values yet.
> >>>
> >>>>>3a) in a limited range of hierarchical models that have
> >>>>>historically
> >>>>>dominated analysis of variance, e.g. split-plot designs, the
> >>>>>reference distribution is known (it's F).
> >>>>
> >>>>Or known more or less well.
> >>>
> >>>Yes.
> >>>
> >>>>>3b) Numerous experts have (quite reasonably) built up a bulwark of
> >>>>>intuitive knowledge about the analysis of such designs.
> >>>>>
> >>>>>3c) the intuition does not necessarily pertain to the analysis of
> >>>>>any
> >>>>>arbitrary hierarchical design, which might be unbalanced, and have
> >>>>>crossed random effects. That is, the intuition might be applied,
> >>>>>but inappropriately.
> >>>>
> >>>>If it has crossed random effects, it is not hierarchical. This is
> >>>>where
> >>>>the approximations are most likely to break down.
> >>>
> >>>I think that it's still hierarchical with cross effects. We still
> >>>have observations nested within levels of random effects.
> >>>
> >>>>>4) in any case, the distribution that is intuitively or otherwise
> >>>>>assumed is the F, because it works in the cases mentioned in 3a.
> >>>>>All that remains is to define the degrees of freedom. The
> >>>>>numerator degrees of freedom are obvious, but the denominator
> >>>>>degrees of freedom are not known.
> >>>>>
> >>>>>4a) numerous other packages supply approximations to the
> >>>>>denominator
> >>>>>degrees of freedom, eg Satterthwaite, and KR (which is related).
> >>>>>They have been subjected to a modest degree of scrutiny by
> >>>>>simulation.
> >>>>>
> >>>>>5) however, it is not clear that the reference distribution is
> >>>>>really
> >>>>>F at all, and therefore it is not clear that correcting the
> >>>>>denominator degrees of freedom is what is needed. Confusion
> >>>>>reigns
> >>>>>on how the p-values should be computed. And because of this
> >>>>>confusion, Doug Bates declines to provide p-values.
> >>>>
> >>>>Simulation can provide a definitive answer, if one believes the
> >>>>model.
> >>>>I do not think that Doug has any fundamental objection to giving
> >>>>the
> >>>>KR approximation. He has, reasonably, had other priorities.
> >>>
> >>>Yes, these things are true.
> >>>
> >>>>Nor has anyone, so far, offered an algorithm that uses the lmer
> >>>>output
> >>>>structures to implement the KR approximation.
> >>>
> >>>Doug and I are looking at it, but it's slow going because he's busy
> >>>and I'm - well - slow.
> >>>
> >>>>>>how can I learn how to properly judge significance from my lmer
> >>>>>>results,
> >>>>>
> >>>>>There are numerous approximations, but no way to properly judge
> >>>>>significance as far as I am aware. Try the R-wiki for algorithms,
> >>>>>and
> >>>>>be conservative.
> >>>>
> >>>>As indicated above, there are other mechanisms, safer in general
> >>>>than
> >>>>using an approximation. Users have to work at them though.
> >>>
> >>>I think that those are still approximations, but of a different
> >>>order.
> >>>
> >>>>Note also that, when tests have a numerator that is a linear
> >>>>combination
> >>>>of variances, there can be issues of a Behrens-Fisher type (see,
> >>>>e.g.
> >>>>the Wikepedia Behrens-Fisher article), where it is impossible to
> >>>>derive
> >>>>a p-value that is valid independent of the relative magnitudes of
> >>>>the
> >>>>unknown variances. The Behrens-Fisher approximation is though,
> >>>>usually, a reasonable compromise.
> >>>
> >>>I didn't know about that; thanks.
> >>>
> >>>>>http://wiki.r-project.org/rwiki/doku.php
> >>>>>
> >>>>>Or, use lme, report the p-values computed therein, and be aware
> >>>>>that
> >>>>>they are not necessarily telling you exactly what you want to
> >>>>>know.
> >>>>>
> >>>>>>and what peer-reviewed references can I steer reviewers
> >>>>>>towards?
> >>>>>
> >>>>>Not sure about that one. I'm working on some simulations with
> >>>>>Doug
> >>>>>but it's slow going, mainly because I'm chronically disorganised.
> >>>>>
> >>>>>>I understand, from other threads, that some believe a paradigm
> >>>>>>shift
> >>>>>>away from p-values may be necessary, but I it is not clear to me
> >>>>>>what paradigm will replace this entrenced view. I can appreciate
> >>>>>>the
> >>>>>>fact that there may be conflicting opinions about the best
> >>>>>>equations/algorithms for determining significance, but is there
> >>>>>>any
> >>>>>>agreement on the goal we are heading towards?
> >>>>>
> >>>>>The conflict is not about p-values per se, but about the way that
> >>>>>they
> >>>>>are calculated. I would bet that the joint goal is to find an
> >>>>>algorithm that provides robust, reasonable inference in a
> >>>>>sufficiently
> >>>>>wide variety of cases that its implementation proves to be
> >>>>>worthwhile.
> >>>>
> >>>>There is an argument about p-values per se, One should,
> >>>>arguably, be
> >>>>examining the profile likelihood, or the whole of the posterior
> >>>>distribution.
> >>>
> >>>"arguably" ... that is an argument! :)
> >>>
> >>>Andrew
> >>>
> >>>--
> >>>Andrew Robinson
> >>>Department of Mathematics and Statistics Tel:
> >>>+61-3-8344-6410
> >>>University of Melbourne, VIC 3010 Australia Fax:
> >>>+61-3-8344-4599
> >>>http://www.ms.unimelb.edu.au/~andrewpr
> >>>http://blogs.mbs.edu/fishing-in-the-bay/
> >
> >--
> >Andrew Robinson
> >Department of Mathematics and Statistics Tel:
> >+61-3-8344-6410
> >University of Melbourne, VIC 3010 Australia Fax:
> >+61-3-8344-4599
> >http://www.ms.unimelb.edu.au/~andrewpr
> >http://blogs.mbs.edu/fishing-in-the-bay/
> >
> >_______________________________________________
> >R-sig-mixed-models at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Andrew Robinson
Department of Mathematics and Statistics Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/
More information about the R-sig-mixed-models
mailing list