[R-sig-ME] interpreting significance from lmer results for dummies (like me)

John Maindonald John.Maindonald at anu.edu.au
Sun Apr 27 09:27:50 CEST 2008


On Behrens-Fisher; see the Wikepedia article "Unsolved problems in  
statistics",
and references that are given there.

Linnik, Jurii (1968). Statistical Problems with Nuisance Parameters.
American Mathematical Society. ISBN 0821815709.

The link given to the

My reading of the literature is that this is not so much an unsolved
problem as one that has, within the p-value paradigm,  no satisfactory
theoretical solution.   I take this to be a problem with the p-value
paradigm.  For Bayesians there is no problem; all inference is
conditional on one or other choice of prior.  The various frequentist
"solutions" to the Behrens-Fisher problem involve one or other
more limited form of conditioning.

Others may be able to pronounce more authoritatively.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 27 Apr 2008, at 2:06 PM, Andrew Robinson wrote:

> Hi John,
>
> On Sun, Apr 27, 2008 at 01:47:40PM +1000, John Maindonald wrote:
>> HI Andrew -
>> Maybe you'd like to send this to the list.  I'd meant to send my
>> reply to the list.
>
> Ok, cc-ing the list.
>
>> My understanding is that, in the usual definition, designs are
>> hierarchical when they have a hierarchical error structure,
>> irrespective of whether the fixed effects are hierarchical. From
>> the Wikipedia article on hierachical linear models:
>> "Multilevel analysis allows variance in outcome variables to be
>> analysed at multiple hierarchical levels, whereas in simple linear
>> and multiple linear regression all effects are modeled to occur at
>> a single level. Thus, HLM is appropriate for use with nested data."
>>
>> This is different from the database notion of a hierarchical data
>> structure.
>
> Sure, I agree.  But I don't think that this definition excludes
> designs with crossed random effects from being hierarchical.
>
>> The Behrens-Fisher type issue (really only an issue for one small
>> relevant df, when the bounds for the p-value can be very wide)
>> makes p-values, in general, somewhat fraught in the mult-level
>> modeling context.  To get unique p-values, one has to make some
>> assumption about bounds on the relevant relative variances.  In
>> part for this sort of reason, I do not really care whether what comes
>> from mcmcsamp() closely resembles one or other choice of
>> p-value.
>
> Can you point me to a citation for that issue?
>
> Cheers
>
> Andrew
>
>
>
>
>> Cheers -
>> John.
>>
>> John Maindonald             email: john.maindonald at anu.edu.au
>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>> Centre for Mathematics & Its Applications, Room 1194,
>> John Dedman Mathematical Sciences Building (Building 27)
>> Australian National University, Canberra ACT 0200.
>>
>>
>> On 27 Apr 2008, at 12:54 PM, Andrew Robinson wrote:
>>
>>> Thanks John - and I have interpolated replies.
>>>
>>> On Sun, Apr 27, 2008 at 12:04:52PM +1000, John Maindonald wrote:
>>>> I have interpolated a few comments.
>>>>
>>>> On 26 Apr 2008, at 5:59 PM, Andrew Robinson wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> On Fri, Apr 25, 2008 at 11:53:24PM -0400, Mark Kimpel wrote:
>>>>>> I am a bioinformatistician, with my strongest background in
>>>>>> molecular
>>>>>> biology. I have been trying to learn about mixed-effects to  
>>>>>> improve
>>>>>> the
>>>>>> analysis of my experiments, which certainly contain random  
>>>>>> effects.
>>>>>> I will
>>>>>> admit to being totally lost in the discussions regarding lack  
>>>>>> of p-
>>>>>> value
>>>>>> reporting in the current versions of lmer. Furthermore, I suspect
>>>>>> those that
>>>>>> need to publish to non-statistical journals will face reviewers  
>>>>>> who
>>>>>> are
>>>>>> equally in the dark. Where can I find a biologist-level  
>>>>>> explanation
>>>>>> of the
>>>>>> current controversy,
>>>>>
>>>>> I'll take a stab.
>>>>>
>>>>> 1) the traditional, Fisher-style test of a null hypothesis is
>>>>> based on
>>>>> computing the probability of observing a test statistic as extreme
>>>>> or more extreme than the one actually observed, assuming that the
>>>>> null hypothesis is true.  This probability is called the p-value.
>>>>> If the p-value is less than some cut-off, e.g. 0.01, then the null
>>>>> hypothesis is rejected.
>>>>>
>>>>> 2) in order to compute that p-value, we need to know the  
>>>>> cumulative
>>>>> distribution function of the test statistic when the null
>>>>> hypothesis is true. In simple cases this is easy: for example, we
>>>>> use the t-distribution for the comparison of two normal means  
>>>>> (with
>>>>> assumed equal variances etc).
>>>>>
>>>>> 3) in (many) hierarchical models the cumulative distribution
>>>>> function
>>>>> of the test statistic when the null hypothesis is true is simply  
>>>>> not
>>>>> known.  So, we can't compute the p-value.
>>>>
>>>> It can though be simulated.  If, though, the assumptions are
>>>> seriously wrong, all bets are off.  In particular normality of
>>>> effects seems to me a much more often serious issue than in most
>>>> models with a simple independent and identically distributed error
>>>> structure, just because there are commonly many fewer independent
>>>> values at the level that matters for the intended  
>>>> generalization(s).
>>>
>>> Agreed.  In fact, in a follow-up to the original questioner I also
>>> mentioned the parametric bootstrap.  That doesn't help with the
>>> assumptions, however.
>>>
>>>> Moreover one can get plausible information about the posterior
>>>> distribution of the parameters from mcmcsamp(), even a Bayesian
>>>> equivalent of a p-value.
>>>
>>> I'm not sure about the relaibility of those p-values yet.
>>>
>>>>> 3a) in a limited range of hierarchical models that have  
>>>>> historically
>>>>> dominated analysis of variance, e.g. split-plot designs, the
>>>>> reference distribution is known (it's F).
>>>>
>>>> Or known more or less well.
>>>
>>> Yes.
>>>
>>>>> 3b) Numerous experts have (quite reasonably) built up a bulwark of
>>>>> intuitive knowledge about the analysis of such designs.
>>>>>
>>>>> 3c) the intuition does not necessarily pertain to the analysis of
>>>>> any
>>>>> arbitrary hierarchical design, which might be unbalanced, and have
>>>>> crossed random effects.  That is, the intuition might be applied,
>>>>> but inappropriately.
>>>>
>>>> If it has crossed random effects, it is not hierarchical.  This is
>>>> where
>>>> the approximations are most likely to break down.
>>>
>>> I think that it's still hierarchical with cross effects.  We still
>>> have observations nested within levels of random effects.
>>>
>>>>> 4) in any case, the distribution that is intuitively or otherwise
>>>>> assumed is the F, because it works in the cases mentioned in 3a.
>>>>> All that remains is to define the degrees of freedom.  The
>>>>> numerator degrees of freedom are obvious, but the denominator
>>>>> degrees of freedom are not known.
>>>>>
>>>>> 4a) numerous other packages supply approximations to the  
>>>>> denominator
>>>>> degrees of freedom, eg Satterthwaite, and KR (which is related).
>>>>> They have been subjected to a modest degree of scrutiny by
>>>>> simulation.
>>>>>
>>>>> 5) however, it is not clear that the reference distribution is
>>>>> really
>>>>> F at all, and therefore it is not clear that correcting the
>>>>> denominator degrees of freedom is what is needed.  Confusion  
>>>>> reigns
>>>>> on how the p-values should be computed.  And because of this
>>>>> confusion, Doug Bates declines to provide p-values.
>>>>
>>>> Simulation can provide a definitive answer, if one believes the
>>>> model.
>>>> I do not think that Doug has any fundamental objection to giving  
>>>> the
>>>> KR approximation.  He has, reasonably, had other priorities.
>>>
>>> Yes, these things are true.
>>>
>>>> Nor has anyone, so far, offered an algorithm that uses the lmer
>>>> output
>>>> structures to implement the KR approximation.
>>>
>>> Doug and I are looking at it, but it's slow going because he's busy
>>> and I'm - well - slow.
>>>
>>>>>> how can I learn how to properly judge significance from my lmer
>>>>>> results,
>>>>>
>>>>> There are numerous approximations, but no way to properly judge
>>>>> significance as far as I am aware.  Try the R-wiki for algorithms,
>>>>> and
>>>>> be conservative.
>>>>
>>>> As indicated above, there are other mechanisms, safer in general  
>>>> than
>>>> using an approximation.  Users have to work at them though.
>>>
>>> I think that those are still approximations, but of a different  
>>> order.
>>>
>>>> Note also that, when tests have a numerator that is a linear
>>>> combination
>>>> of variances, there can be issues of a Behrens-Fisher type (see,  
>>>> e.g.
>>>> the Wikepedia Behrens-Fisher article), where it is impossible to
>>>> derive
>>>> a p-value that is valid independent of the relative magnitudes of  
>>>> the
>>>> unknown variances. The Behrens-Fisher approximation is though,
>>>> usually, a reasonable compromise.
>>>
>>> I didn't know about that; thanks.
>>>
>>>>> http://wiki.r-project.org/rwiki/doku.php
>>>>>
>>>>> Or, use lme, report the p-values computed therein, and be aware  
>>>>> that
>>>>> they are not necessarily telling you exactly what you want to  
>>>>> know.
>>>>>
>>>>>> and what peer-reviewed references can I steer reviewers
>>>>>> towards?
>>>>>
>>>>> Not sure about that one.  I'm working on some simulations with  
>>>>> Doug
>>>>> but it's slow going, mainly because I'm chronically disorganised.
>>>>>
>>>>>> I understand, from other threads, that some believe a paradigm
>>>>>> shift
>>>>>> away from p-values may be necessary, but I it is not clear to me
>>>>>> what paradigm will replace this entrenced view. I can appreciate
>>>>>> the
>>>>>> fact that there may be conflicting opinions about the best
>>>>>> equations/algorithms for determining significance, but is there  
>>>>>> any
>>>>>> agreement on the goal we are heading towards?
>>>>>
>>>>> The conflict is not about p-values per se, but about the way that
>>>>> they
>>>>> are calculated.  I would bet that the joint goal is to find an
>>>>> algorithm that provides robust, reasonable inference in a
>>>>> sufficiently
>>>>> wide variety of cases that its implementation proves to be
>>>>> worthwhile.
>>>>
>>>> There is an argument about p-values per se,  One should,  
>>>> arguably, be
>>>> examining the profile likelihood, or the whole of the posterior
>>>> distribution.
>>>
>>> "arguably"  ... that is an argument!  :)
>>>
>>> Andrew
>>>
>>> -- 
>>> Andrew Robinson
>>> Department of Mathematics and Statistics            Tel:
>>> +61-3-8344-6410
>>> University of Melbourne, VIC 3010 Australia         Fax:
>>> +61-3-8344-4599
>>> http://www.ms.unimelb.edu.au/~andrewpr
>>> http://blogs.mbs.edu/fishing-in-the-bay/
>
> -- 
> Andrew Robinson
> Department of Mathematics and Statistics            Tel:  
> +61-3-8344-6410
> University of Melbourne, VIC 3010 Australia         Fax:  
> +61-3-8344-4599
> http://www.ms.unimelb.edu.au/~andrewpr
> http://blogs.mbs.edu/fishing-in-the-bay/
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models




More information about the R-sig-mixed-models mailing list