[R-sig-ME] Fwd: same old question - lme4 and p-values

Sun Apr 6 03:37:16 CEST 2008

I agree with Lorenz Gygax.  I'll come back to p-values below.

Confidence intervals (CIs) make, for me, a lot more sense than
p-values. The reality, though, is that users will interpret CIs as
some kind of probability statement.  For all practical purposes,
a CI is just the Bayesian credible interval that one gets with
some suitable "non-informative prior".  Why not then be specific
about the prior, and go with the Bayesian credible interval?
(There is an issue whether such a prior can always be found.
Am right in judging this no practical consequence?)

There are cases where the prior is informative in a sense that
breaks the nexus between the CI and a realistic Bayesian
credible interval.  A similar issue arises for a p-value; the
probability of the evidence given innocence (or freedom from
some rare disease)  (this is H0) is dramatically different from
the probability of innocence given the evidence, and it may
be a difference between 1/100000 and 1/2.  Where the
Bayesian credible interval and the CI are dramatically different,
a p-value or CI can only mislead.  In the way that p-values are
commonly taught, it may take considerable strength of will to
avoid confusion between P(A | H0) and (P(H0 | A)!

For intervals for variances, the prior can matter a lot, if a
smallish number of independent pieces of information is used
to  estimate the variance and/or those pieces of information
have widely varying weights.  I guess that emphasizes how
insecure inference about variances can be.  It is much worse
than the common forms of CI indicate.

If one is to take abs(t) > 2 as indicating significance, this is
under iid Normal assumptions a p-value of 0.1 for 5df, and 0.18
for 2 degrees of freedom.  One has to ask members of the
relevant scientific community whether they are comfortable
with that, given also that those p-values are likely to be more
than otherwise suspect because of the small number of
degrees of freedom.  Or are we discussing experiments where
we always have at least 10 degrees of freedom?  If not, and
there is an insistence on making claims of "significance",
maybe we want abs(t) > 2.5 or abs(t) > 3.

I do not see any cogent reason to be concerned that the
distribution of the Bayesian p-value may, under H0, be far
from uniform on (0,1).  This, if it is an issue, is an especial issue
for intervals for variances.

Why not then, for models fitted using lmer, a Bayesian HPD
interval, given that Douglas has made it so easy to calculate
these? This seems to me more than otherwise pertinent if the
emphasis is on effect size.

None of these various measures is more than a very crude
summary of what has been achieved.  Maybe plots of posterior
density estimates might be given for key parameters, ideally
with some indication of sensitivity to the prior (this would need
more than mcmcsamp()).  In any case, publish the data, so that
the sceptical reader can make his/her own checks, and/or use
it in the design of future experiments, and/or so that it can be
used as a teaching resource.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 6 Apr 2008, at 12:13 AM, Martin Maechler wrote:

>>>>>> "Jon" == Jonathan Baron <baron at psych.upenn.edu>
>>>>>>   on Sat, 5 Apr 2008 07:21:19 -0400 writes:
>
>   Jon> On 04/05/08 12:10, Reinhold Kliegl wrote:
>
> [...]
>
>>> In perspective, I think the p-value problem will simply
>>> go away.
>
>   Jon> I'm not sure what you mean here.  If you mean to
>   Jon> replace them with confidence intervals, I have no
>   Jon> problem with that.  But, as a journal editor, I am
>   Jon> afraid that I will continue to insist on some sort of
>   Jon> evidence that effects are real.  This can be done in
>   Jon> many ways.  But too many authors submit articles in
>   Jon> which the claimed effects can result from random
>   Jon> variation, either in subjects ("participants*") or
>   Jon> items, and they don't correctly reject such alternative
>   Jon> explanations of a difference in means.
>
>   Jon> I have noticed a kind of split among those who comment
>   Jon> on this issue.  On the one side are those who are
>   Jon> familiar with fields such as epidemiology or economics
>   Jon> (excluding experimental economics), where the claim is
>   Jon> often made that "the null hypothesis is always false
>   Jon> anyway, so why bother rejecting it?"  These are the
>   Jon> ones interested in effect sizes, variance accounted
>   Jon> for, etc.  They are correct for this kind of research,
>   Jon> but there are other kinds of research.
>
>   Jon> On the other side, are those from (e.g.) experimental
>   Jon> psychology, where the name of the game is to design
>   Jon> experiments that are so well controlled that the null
>   Jon> hypothesis will be true if the effect of interest is
>   Jon> absent.  As a member of this group, when I read people
>   Jon> from the first group, I find it very discouraging.  It
>   Jon> is almost as if they are saying that what I work so
>   Jon> hard to try to do is impossible.
>
>   Jon> To get a little specific, although I found Gelman and
>   Jon> Hill's book very helpful on many points (and it does
>   Jon> not deny the existence of people like me), it is
>   Jon> written largely for members of the first group.  By
>   Jon> contrast, Baayen's book is written for people like me,
>   Jon> as is the Baayen, Davidson, and Bates article, "Mixed
>   Jon> effects modeling with crossed random effects for
>   Jon> subjects and items."
>
>   Jon> I'm afraid we do need significance tests, or confidence
>   Jon> intervals, or something.
>
> I agree even though I'm very deeply inside the camp of statisticians
> who know that all models are wrong but some are useful, and
> hence I do not "believe" any P-values (or exact confidence /
> credibility intervals).
>
> For those who need ``something like a P-value'' I've heard
> yesterday Lorenz Gygax (also subscriber here) proposing
> to report the "credibility of 0", possibly "2-sided", as a
> pseudo-P value;, i.e. basically that would be
> 2 * k/n, for an MCMC sample b_1,b_2, ..., b_n
> k := {min k'; b_k' > 0}.
> The reasoning would be the following:
> Use the 1-to-1 correspondence between confidence intervals and
> testing pretending that the credibility intervals are confidence
> intervals, and consequently you just need to look at which
> confidence level 0 will be at the exact border of the
> credibility interval.
>
> Yesterday after the talk, I found that a good idea.
> Just now, it seems a bit doubtful, since under the null
> hypothesis, I don't think such a pseudo P-value would be uniform
> in [0,1].
>
> Martin
>
>
>   Jon> * On "participants" vs. "subjects" see:
>   Jon> http://www.psychologicalscience.org/observer/getArticle.cfm?id=1549
>
>   Jon> -- Jonathan Baron, Professor of Psychology, University
>   Jon> of Pennsylvania Home page:
>   Jon> http://www.sas.upenn.edu/~baron Editor: Judgment and
>   Jon> Decision Making (http://journal.sjdm.org)
>
>   Jon> _______________________________________________
>   Jon> R-sig-mixed-models at r-project.org mailing list
>   Jon> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models