[R-sig-ME] interpreting significance from lmer results for dummies (like me)
Andrew Robinson
A.Robinson at ms.unimelb.edu.au
Sun Apr 27 06:06:23 CEST 2008
Hi John,
On Sun, Apr 27, 2008 at 01:47:40PM +1000, John Maindonald wrote:
> HI Andrew -
> Maybe you'd like to send this to the list. I'd meant to send my
> reply to the list.
Ok, cc-ing the list.
> My understanding is that, in the usual definition, designs are
> hierarchical when they have a hierarchical error structure,
> irrespective of whether the fixed effects are hierarchical. From
> the Wikipedia article on hierachical linear models:
> "Multilevel analysis allows variance in outcome variables to be
> analysed at multiple hierarchical levels, whereas in simple linear
> and multiple linear regression all effects are modeled to occur at
> a single level. Thus, HLM is appropriate for use with nested data."
>
> This is different from the database notion of a hierarchical data
> structure.
Sure, I agree. But I don't think that this definition excludes
designs with crossed random effects from being hierarchical.
> The Behrens-Fisher type issue (really only an issue for one small
> relevant df, when the bounds for the p-value can be very wide)
> makes p-values, in general, somewhat fraught in the mult-level
> modeling context. To get unique p-values, one has to make some
> assumption about bounds on the relevant relative variances. In
> part for this sort of reason, I do not really care whether what comes
> from mcmcsamp() closely resembles one or other choice of
> p-value.
Can you point me to a citation for that issue?
Cheers
Andrew
> Cheers -
> John.
>
> John Maindonald email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473 fax : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
>
>
> On 27 Apr 2008, at 12:54 PM, Andrew Robinson wrote:
>
> >Thanks John - and I have interpolated replies.
> >
> >On Sun, Apr 27, 2008 at 12:04:52PM +1000, John Maindonald wrote:
> >>I have interpolated a few comments.
> >>
> >>On 26 Apr 2008, at 5:59 PM, Andrew Robinson wrote:
> >>
> >>>Hi Mark,
> >>>
> >>>On Fri, Apr 25, 2008 at 11:53:24PM -0400, Mark Kimpel wrote:
> >>>>I am a bioinformatistician, with my strongest background in
> >>>>molecular
> >>>>biology. I have been trying to learn about mixed-effects to improve
> >>>>the
> >>>>analysis of my experiments, which certainly contain random effects.
> >>>>I will
> >>>>admit to being totally lost in the discussions regarding lack of p-
> >>>>value
> >>>>reporting in the current versions of lmer. Furthermore, I suspect
> >>>>those that
> >>>>need to publish to non-statistical journals will face reviewers who
> >>>>are
> >>>>equally in the dark. Where can I find a biologist-level explanation
> >>>>of the
> >>>>current controversy,
> >>>
> >>>I'll take a stab.
> >>>
> >>>1) the traditional, Fisher-style test of a null hypothesis is
> >>>based on
> >>>computing the probability of observing a test statistic as extreme
> >>>or more extreme than the one actually observed, assuming that the
> >>>null hypothesis is true. This probability is called the p-value.
> >>>If the p-value is less than some cut-off, e.g. 0.01, then the null
> >>>hypothesis is rejected.
> >>>
> >>>2) in order to compute that p-value, we need to know the cumulative
> >>>distribution function of the test statistic when the null
> >>>hypothesis is true. In simple cases this is easy: for example, we
> >>>use the t-distribution for the comparison of two normal means (with
> >>>assumed equal variances etc).
> >>>
> >>>3) in (many) hierarchical models the cumulative distribution
> >>>function
> >>>of the test statistic when the null hypothesis is true is simply not
> >>>known. So, we can't compute the p-value.
> >>
> >>It can though be simulated. If, though, the assumptions are
> >>seriously wrong, all bets are off. In particular normality of
> >>effects seems to me a much more often serious issue than in most
> >>models with a simple independent and identically distributed error
> >>structure, just because there are commonly many fewer independent
> >>values at the level that matters for the intended generalization(s).
> >
> >Agreed. In fact, in a follow-up to the original questioner I also
> >mentioned the parametric bootstrap. That doesn't help with the
> >assumptions, however.
> >
> >>Moreover one can get plausible information about the posterior
> >>distribution of the parameters from mcmcsamp(), even a Bayesian
> >>equivalent of a p-value.
> >
> >I'm not sure about the relaibility of those p-values yet.
> >
> >>>3a) in a limited range of hierarchical models that have historically
> >>>dominated analysis of variance, e.g. split-plot designs, the
> >>>reference distribution is known (it's F).
> >>
> >>Or known more or less well.
> >
> >Yes.
> >
> >>>3b) Numerous experts have (quite reasonably) built up a bulwark of
> >>>intuitive knowledge about the analysis of such designs.
> >>>
> >>>3c) the intuition does not necessarily pertain to the analysis of
> >>>any
> >>>arbitrary hierarchical design, which might be unbalanced, and have
> >>>crossed random effects. That is, the intuition might be applied,
> >>>but inappropriately.
> >>
> >>If it has crossed random effects, it is not hierarchical. This is
> >>where
> >>the approximations are most likely to break down.
> >
> >I think that it's still hierarchical with cross effects. We still
> >have observations nested within levels of random effects.
> >
> >>>4) in any case, the distribution that is intuitively or otherwise
> >>>assumed is the F, because it works in the cases mentioned in 3a.
> >>>All that remains is to define the degrees of freedom. The
> >>>numerator degrees of freedom are obvious, but the denominator
> >>>degrees of freedom are not known.
> >>>
> >>>4a) numerous other packages supply approximations to the denominator
> >>>degrees of freedom, eg Satterthwaite, and KR (which is related).
> >>>They have been subjected to a modest degree of scrutiny by
> >>>simulation.
> >>>
> >>>5) however, it is not clear that the reference distribution is
> >>>really
> >>>F at all, and therefore it is not clear that correcting the
> >>>denominator degrees of freedom is what is needed. Confusion reigns
> >>>on how the p-values should be computed. And because of this
> >>>confusion, Doug Bates declines to provide p-values.
> >>
> >>Simulation can provide a definitive answer, if one believes the
> >>model.
> >>I do not think that Doug has any fundamental objection to giving the
> >>KR approximation. He has, reasonably, had other priorities.
> >
> >Yes, these things are true.
> >
> >>Nor has anyone, so far, offered an algorithm that uses the lmer
> >>output
> >>structures to implement the KR approximation.
> >
> >Doug and I are looking at it, but it's slow going because he's busy
> >and I'm - well - slow.
> >
> >>>>how can I learn how to properly judge significance from my lmer
> >>>>results,
> >>>
> >>>There are numerous approximations, but no way to properly judge
> >>>significance as far as I am aware. Try the R-wiki for algorithms,
> >>>and
> >>>be conservative.
> >>
> >>As indicated above, there are other mechanisms, safer in general than
> >>using an approximation. Users have to work at them though.
> >
> >I think that those are still approximations, but of a different order.
> >
> >>Note also that, when tests have a numerator that is a linear
> >>combination
> >>of variances, there can be issues of a Behrens-Fisher type (see, e.g.
> >>the Wikepedia Behrens-Fisher article), where it is impossible to
> >>derive
> >>a p-value that is valid independent of the relative magnitudes of the
> >>unknown variances. The Behrens-Fisher approximation is though,
> >>usually, a reasonable compromise.
> >
> >I didn't know about that; thanks.
> >
> >>>http://wiki.r-project.org/rwiki/doku.php
> >>>
> >>>Or, use lme, report the p-values computed therein, and be aware that
> >>>they are not necessarily telling you exactly what you want to know.
> >>>
> >>>>and what peer-reviewed references can I steer reviewers
> >>>>towards?
> >>>
> >>>Not sure about that one. I'm working on some simulations with Doug
> >>>but it's slow going, mainly because I'm chronically disorganised.
> >>>
> >>>>I understand, from other threads, that some believe a paradigm
> >>>>shift
> >>>>away from p-values may be necessary, but I it is not clear to me
> >>>>what paradigm will replace this entrenced view. I can appreciate
> >>>>the
> >>>>fact that there may be conflicting opinions about the best
> >>>>equations/algorithms for determining significance, but is there any
> >>>>agreement on the goal we are heading towards?
> >>>
> >>>The conflict is not about p-values per se, but about the way that
> >>>they
> >>>are calculated. I would bet that the joint goal is to find an
> >>>algorithm that provides robust, reasonable inference in a
> >>>sufficiently
> >>>wide variety of cases that its implementation proves to be
> >>>worthwhile.
> >>
> >>There is an argument about p-values per se, One should, arguably, be
> >>examining the profile likelihood, or the whole of the posterior
> >>distribution.
> >
> >"arguably" ... that is an argument! :)
> >
> >Andrew
> >
> >--
> >Andrew Robinson
> >Department of Mathematics and Statistics Tel:
> >+61-3-8344-6410
> >University of Melbourne, VIC 3010 Australia Fax:
> >+61-3-8344-4599
> >http://www.ms.unimelb.edu.au/~andrewpr
> >http://blogs.mbs.edu/fishing-in-the-bay/
--
Andrew Robinson
Department of Mathematics and Statistics Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/
More information about the R-sig-mixed-models
mailing list