[R] shapiro wilk normality test

(Ted Harding) Ted.Harding at manchester.ac.uk
Mon Jul 14 01:16:55 CEST 2008


See at end.

On 13-Jul-08 21:42:19, Johannes Huesing wrote:
> Ted Harding <Ted.Harding at manchester.ac.uk> [Sun, Jul 13, 2008 at
> 10:59:21PM CEST]:
>> On 13-Jul-08 19:53:47, Johannes Huesing wrote:
>> > Frank E Harrell Jr <f.harrell at vanderbilt.edu> [Sun, Jul 13, 2008 at
>> > 08:07:37PM CEST]:
>> >> (Ted Harding) wrote:
>> >>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
>> >>>> [...]
>> >>>> A large P-value means nothing more than needing more data.  No  
>> >>>> conclusion is possible.  
> [...]
> 
>> But "absence
>> of evidence", in my interpretation (which I believe is right for
>> the statistical context of "non-significant P-values"), means that
>> we do not know about A: we do not have enough information.
>> 
> 
> What would the p-value have to be like in your opinion to make the
> null hypothesis look more likely after the experiment than before?
> 
>> The proof is, basically, given in terms of a 2-valued logic where
>> every term is either TRUE or FALSE. In the real world we have at
>> least a third possible value: UNKNOWN (or, as R would put it, NA).
> 
> How would the probabilities that A is NA be affected by the outcome
> of an experiment like this? If this probability is affected, how
> does this leave the probability that A is T or F unaffected?
> 
> Or do you assign the NA status to the data collected?
> 
> A high p-value does not always equate that you might as well have 
> collected nothing but missing values. 
> 
> Of course I buy into the notion that a point estimate with a measure
> of accuracy is much better suited to describe your data; but a
> high p-value as a result of a test procedure that can be claimed to
> be adequately powered may defensibly be taken as a hint that we
> can for now stick with the null hypothesis.
> -- 
> Johannes Hüsing

I shall perhaps try later to respond in more detail to specific
points above. But, for the moment, let me say that I think your
statement "a high p-value as a result of a test procedure that
can be claimed to be adequately powered may defensibly be taken
as a hint that we can for now stick with the null hypothesis"
is the main key.

The power function of a test (which of course depends on the
design of the investigation and on its size, i.e. number of
data gathered) is basically much the same (in my mind) as the
amount of evidence.

A high P-value with a very powerful test serves to exclude
all alternatives to the Null Hypothesis except those which
lie very close to the Null Hypothesis.

In that sense, we do in fact have a lot of evidence against
all hypotheses except those which are very similar to the Null.
So we are not in an "absence of evidence" situation, and we
do have "evidence of absence".

The basic logic of a Hypothesis Test (in its standard sense)
is the generalisation, to a logic where certainty is at best
probabilistic, of the classical-logic argument:

Given (as a matter of fact): If A, then B
Observed: B is FALSE
Conclusion: A is FALSE

Probabilistically:
Given: If A (H0), then B has high probability
Observed: B is FALSE
Conclusion: An event (not-B) has occurred which has very
small probability if A is TRUE. Hence we (as George Barnard
used to put it) apply "The Principle of Disbelief in Tall Stories"
and disbelieve A to the extent that we disbelieve not-B as
a possible outcome from A (H0).

In applications, the event B will be specified in terms of
a set of possible values of a Test Statistic T, devised so
as to represent an interesting measure of discrepancy between
the data and the hypothesis H0 (e.g. the t-statistic for
testing whether two samples are drawn from populations with
equal means -- if that is the case, then E(T) = 0, and the
set of values {abs(T) > T0} will be a "discrepant set".

By choosing T0 to be such that Prob(abs(T) > T0) = p0, a small
value which we choose to suit ourselves, we are defining the
threshold at which we are prepared to deem that "the claim
that Abs(T) > T0 is compatible with H0" is too unlikely to
be plausible.

The cleanest example in real life can be drawn from the basic
principle in criminal law for concluding that an accused person
is guilty, namely "The accused is deemed innocent until proved
guilty beyond reasonable doubt".

What constitutes "reasonable doubt" can become a very interesting
question, but there are some crimes for which it has a definite
statistical interpretation, typically exceeding some authorised
limit (of speed in a vehicle, of alcohol content in the blood
while driving a vehicle, of a factory plant exceeding permitted
levels of polluting emissions [which in the UK, under the
Environmental Protection Act, is a criminal offence].

In the days when blood alcohol was determined by laboratory
analysis of a blood sample, it was possible to determine that
the "margin of error" corresponded to a P-value less than or
equal to 0.001 (i.e. if the lab analysis yielded a result in
exceess of the legal limit + 2*SE, then the inevitable result
was a conviction unless it could be independently proved in
defence that the statutory procedures were carried out in a
flawed manner).

So, in that case, "beyond reasonable doubt" meant "The P-value
of the data was less that 1/1000".

But, if the lab analysis gave 80mg/100ml (the legal limit in
the UK), then at best you can conclude that the result equally
favoured any two hypotheses equidistant on either side of the
legal limit. But while this constitutes (in the sense explained)
absence of evidence for guilt (i.e. alc > 80), it certainly
does not exclude it (someone at 81, and therefore truly guilty,
could be quite likely to give a result of 80). So the "80" result
is not evidence of innocence -- it is merely lack of evidence of
guilt.

It gets worse with the environmental pollution situation. For
the blood alcohol and the lab analysis of a blood sample, the
lab procedure is only legally valid if it consistently achieves
an SE of determination of 2% or less (taken as 2mg/100ml for
results below 100).

Thus the power function has Power(alc) = 0.001 at alc=80,
Power(alc) = 0.5  at alc=86, Power(alc) = 0.999 at alc=92.
Thus the innocent (alc <= 80) have a good protection against
false conviction; the marginally guilty (alc < 86, say)
are likely to get away with it; the seriously guilty (alc > 92)
are almost certain to be convicted.

However, the kinds of measurement which can be made of, say,
atmospheric pollution are subject to SEs which are more like 20%
and are often higher (50% or more). To achieve the requisite
"beyond reasonable doubt" (since it is a criminal offence) on
the same criterion (3*SE above) means that the procedure is only
effective when the emission is say twice the permitted level
(or even more). Here we have lack of evidence in a very real
sense (the procedure is weak). It would be quite possible for
a polluter  emit well above the permitted level, yet the sampling
give a result well below the permitted level. Hence, such absence
of evidence is certainly not evidence of absence.

And, if I understand correctly, this is pretty much what Frank
Harrell meant when he wrote "A large P-value means nothing more
than needing more data. No conclusion is possible.  Please read
the classic paper Absence of  Evidence is not Evidence for Absence."
[Or "better data", one might add]. But it does need to be qualified
(as I try to do above) by consideration of whereabouts on the
"effect" scale the procedure becomes capable of doing its job,
which in turn brings in issues about the importance (in real life)
of the sort of departure from H0 that it is important to detect.
The blood-alcohol test does a reasonably good job (one is prepared
to accept a relatively narrow "grey area" where any conclusion
is unclear). The pollution test does not.

Mustn't go on too long!

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 14-Jul-08                                       Time: 00:16:50
------------------------------ XFMail ------------------------------



More information about the R-help mailing list