[R-sig-ME] Fwd: same old question - lme4 and p-values

Mon Apr 7 15:13:33 CEST 2008

On Sun, Apr 6, 2008 at 9:05 PM, David Henderson
<dnadave at revolution-computing.com> wrote:
> Hi John:

>  > For all practical purposes, a CI is just the Bayesian credible
>  > interval that one gets with some suitable "non-informative prior".
>  > Why not then be specific about the prior, and go with the Bayesian
>  > credible interval?  (There is an issue whether such a prior can
>  > always be found.  Am right in judging this no practical consequence?)

>  What?  Could you explain this a little more?  There is nothing
>  Bayesian about a classical (i.e. not Bayesian credible set or highest
>  posterior density, or whatever terminology you prefer) CI.  The
>  interpretation is completely different, and the assumptions used in
>  deriving the interval are also different.  Even though the interval
>  created when using a noninformative prior is similar to a classical
>  CI, they are not the same entity.

>  Now, while i agree with the arguments about p-values and their
>  validity, there is one aspect missing from this discussion.  When
>  creating a general use package like lme4, we are trying to create
>  software that enables statisticians and researchers to perform the
>  statistical analyses they need and interpret the results in ways that
>  HELP them get published.  While I admire Doug for "drawing a line in
>  the sand" in regard to the use of p-values in published research, this
>  is counter to HELPING the researcher publish their results.  There has
>  to be a better way to further your point in the community than FORCING
>  your point upon them.  Education of the next generation of researchers
>  and journal editors is admittedly slow, but a much more community
>  friendly way of getting your point used in practice.

Perhaps I should clarify.  The summary of a fitted lmer model does not
provide p-values because I don't know how to calculate them in an
acceptable way, not because I am philosophically opposed to them.  The
estimates and the approximate standard errors can be readily
calculated as can their ratio.  The problem is determining the
appropriate reference distribution for that ratio from which to
calculate a p-value.  In fixed-effects models (under the "usual"
assumptions) that ratio is distributed as a T with a certain number of
degrees of freedom.  For mixed models it is not clear exactly what
distribution it has - except in certain cases of completely balanced
data sets (i.e. the sort of data sets that occur in text books).  At
one time I used a T distribution and an upper bound on the degrees of
freedom but I was persuaded that providing p-values that could be
strongly "anti-conservative" is worse than not providing any.

That decision not to provide p-values is particularly inconvenient to
many users who are not especially interested in statistical niceties
but do need to satisfy editors or referees who want to see p-values.
I know that is a real problem.  My earlier comment about having
created a monster that now turns on us, which touched off this line of
discussion, was more about the fact that we try to take complex
analyses and reduce the conclusions from them to a single number, the
p-value. We can provide considerable information about the models that
are fit to the experimenter's data but without p-values the
experimenter may be unable to publish the results.

The approach that I feel is most likely to be successful in
summarizing these models is first to obtain the REML or ML estimates
of the parameters then to run a Markov chain Monte Carlo sampler to
assess the variability in the parameters (or, if you prefer, the
variability in the parameter estimators).  (Note: I am not advocating
using MCMC to obtain the estimates, I suggest MCMC for assessing the
variability.)

The current version of the mcmcsamp function suffers from the
practical problem that it gets stuck at near-zero values of variance
components.  There are some approaches to dealing with that.  Over the
weekend I thought that I had a devastatingly simple way of dealing
with such cases until I reflected on it a bit more and realized that
it would require a division by zero.  Other than that, it was a good
idea.

The practical problem with the mcmcsamp function at present is th

>  Just my $0.02...
>
>  Dave H
>  --
>  David Henderson, Ph.D.
>  Director of Community
>  REvolution Computing
>  1100 Dexter Avenue North, Suite 250
>  206-577-4778 x3203
>  DNADave at Revolution-Computing.Com
>  http://www.revolution-computing.com
>
>
>
>  _______________________________________________
>  R-sig-mixed-models at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>