[R] p-values

Wed Apr 28 17:13:50 CEST 2004

On Tue, 27 Apr 2004 22:25:22 +0100 (BST)
(Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> wrote:

> On 27-Apr-04 Greg Tarpinian wrote:
> > I apologize if this question is not completely 
> > appropriate for this list.
> 
> Never mind! (I'm only hoping that my response is ... )
> 
> > [...]
> > This week I have been reading "Testing Precise
> > Hypotheses" by J.O. Berger & Mohan Delampady,
> > Statistical Science, Vol. 2, No. 3, 317-355 and
> > "Bayesian Analysis: A Look at Today and Thoughts of
> > Tomorrow" by J.O. Berger, JASA, Vol. 95, No. 452, p.
> > 1269 - 1276, both as supplements to my Math Stat.
> > course.
> > 
> > It appears, based on these articles, that p-values are
> > more or less useless.
> 
. . .
> I don't have these articles available, but I'm guessing
> that they stress the Bayesian approach to inference.
> Saying "p-values are more or less useless" is controversial.
> Bayesians consider p-values to be approximately irrelevant
> to the real question, which is what you can say about
> the probability that a hypothesis is true/false, or
> what is the probability that a parameter lies in a
> particular range (sometimes the same question); and the
> "probability" they refer to is a posterior probability
> distribution on hypotheses, or over parameter values.
> The "P-value" which is emitted at the end of standard
> analysis is not such a probability, but instead is that part
> of a distribution over the sample space which is defined
> by a "cut-off" value of a test statistic calculated from the
> data. So they are different entities. Numerically they may
> coincide; indeed, for statistical problems with a certain
> structure the P-value is equal to the Bayesian posterior
> probability when a particular prior distribution is
> adopted.
> 
> > If this is indeed the case,
> > then why is a p-value typically given as a default
> > output?  For example, I know that PROC MIXED and 
> > lme( ) both yield p-values for fixed effects terms.
> 
> P-values are not as useless as sometimes claimed. They
> at least offer a measure of discrepancy between data and
> hypothesis (the smaller the P-value, the more discrepant
> the data), and they offer this measure on a standard scale,
> the "probabiltiy scale" -- the chance of getting something
> at least as discrepant, if the hypothesis being tested is
> true. What "discrepant" objectively means is defined by
> the test statistic used in calculating the P-value: larger
> values of the test statistic correspond to more discrepant
> data.

Ted, this opens up a can of worms, depending on what you mean by
"discrepant" and even "data" (something conditioned upon or a stochastic
quantity that we happen to only be looking at one copy of?).  I think your
statement plays into some of the severe difficulties with P-values,
especially large P-values.

> 
> Confidence intervals are essentially aggregates of hypotheses
> which have not been rejected at a significance level equal
> to 1 minus the P-value.
> 
> The P-value/confidence-interval approach (often called the
> "frequentist approach") gives results which do not depend
> on assuming any prior distribution on the parameters/hypotheses,
> and therefore could be called "objective" in that they
> avoid being accused of importing "subjective" information
> into the inference in the form of a Bayesion prior distribution.

They are objective only in the sense that subjectivity is deferred in a
difficult to document way when P-values are translated into decisions.

> This can have the consequence that your confidence interval
> may include values in a range which, a priori, you do not
> acept as plausible; or exclude a range of values in which
> you are a priori confident that the real value lies.
> The Bayesian comment on this situation is that the frequentist
> approach is "incoherent", to which the frequentist might
> respond "well, I just got an unlucky experiment this time"
> (which is bound to occur with due frequency).
> 
> > The theory I am learning does not seem to match what
> > is commonly available in the software, and I am just
> > wondering why.
> 
> The standard ritual for evaluating statistical estimates
> and hypothesis tests is frequentist (as above). Rightly
> interpreted, it is by no means useless. For complex
> historical reasons, it has become the norm in "research
> methodology", and this is essentially why it is provided
> by the standard software packages (otherwise pharmaceutical
> companies would never buy the software, since they need
> this in order to get past the FDA or other regulatory
> authority). However, because this is the "norm", such
> results often have more meaning attributed to them than
> they can support, by people disinclined to delve into
> what "rightly interpreted" might mean.

The statement that frequentist methods are the norm, which I'm afraid is
usually true, is a sad comment on the state of much of "scientific"
inquiry.  IMHO P-values are so defective that the imperfect Bayesian
approach should be seriously entertained.

> 
> This is not a really clean answer to your question; but
> then your question touches on complex and conflicting
> issues!
> 
> Hoping this helps (and hoping that I am not poking a
> hornets' nest here)!
> Ted.
> 

---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University
---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University