[R] Shapiro-Welch W value interpretation

Mon Oct 1 21:13:36 CEST 2007

Thanks for your answer, I will try to make myself  clearer and do not
worry to point out what I've got wrong.

> There is some confusion in your query.
> First, how do you know that your data are indeed normally distributed?
In this specific case I *know* that my data is normally distributed
because I generated it using a random number generator. I showed it as
an example for my question because I want to be sure I understand the
meaning of the W value.

> That's *not* what the p-value of the test says.

Then maybe I do not understand very well what is the p-value that is
generated in R as, from  my understanding P-values are generally
accepted as "the smallest fixed level at which the the null hypothesis
can be rejected.", therefore if i fix my significance level at 0.01, a
p-value = 0.8791 of will indicate that the null hypothesis is NOT rejected.

> Consider the following result of the Shapiro-Wilk test applied to
> a vector x:
>
> data: x
> W = 0.9856, p-value = 0.988
>
> Here x was not sampled from a normal distribution (code at end).

Then how would those results will be interpreted? and how would they
be compared against:

W = 0.9932, p-value = 0.8996
obtained by the example given in the R shapiro-wilk help page
(http://stat.ethz.ch/R-manual/R-patched/library/stats/html/shapiro.test.html)
rnorm(100, mean = 5, sd = 3)

against for example:
W = 0.9479, p-value = 0.0006035
obtained from the second example in the same page:  runif(100, min =
2, max = 4)

It would seem to me that the p-value indeed follows the logic I
mentioned earlier, that is, the first example IS  anormal distribution
(it was generated by rnorm) and the p-value is for > 0.01 which means
that H_0 will be mostly accepted  while the second example iS NOT
normal (generated with runif) and its p-value is < 0.01 hence  H_0
will not be usually rejected. Here i assume H_0  is that "the sample
is normally distributed".

Is this right? the result given by your sample  puzzled me first but
if you increase the number of samples, the test will give you a lower
p-value :

set.seed(34); shapiro.test(rexp(30))
W = 0.8898, p-value = 0.004773

Which means it is MOST LIKELY not normally distributed (as the p-value
is < 0.01)
>
> Second, the point of a p-value is to formalize decision-making
> so that critical regions of tests are converted to p-value intervals.
> Thus, your emphasis on the value of W is misplaced. It's
> not how small W is but how small it is for the given sample size,
> and the p-value takes care of the significance.
 I know what the p-value means, what I do not know is what the W value
means. What I need to know is if I report somewhere that certain
distribution is normally distributed because after doing the
shapiro-wilk test, the results where  W = 0.9989, p-value = 0.8791,
then I can say that the the H_0 in the shapiro test is true because
the p-value > 0.01 BUT i do not know what to say about W.

> (This is not to
> say, of course, that the distribution of W is not of interest.)
>

And this is were my questions goes, what is the meaning if the W for
the normal distribution? specifically what does the W says *about* the
data? (does it says something? or is it ), this when testing for
normality of course.

> Finally, what exactly, in your view, is "the hypothesis"?
>
The hypothesis is H_0: The sample is distributed normally (isn't that
what Shapiro_Wilk aims to test?)

> I hope this doesn't sound too critical. I'm trying to be helpful.
>
Again, do not worry, and thank you VERY MUCH for your time answering
and reading this. Also, if my tone seems harsh in this mail, it is not
intended as that, I just tried to write down the facts.

Regards,

Omar

On 9/30/07, P Ehlers <ehlers at math.ucalgary.ca> wrote:
>
> Omar Baqueiro wrote:
> > Hello,
> >
> > I have tested a distribution for normality using the Shapiro-Welch
> > statistic. The result of this is the following:
> >
> >
> >         Shapiro-Wilk normality test
> >
> > data: mydata
> > W = 0.9989, p-value = 0.8791
> >
> >
> > I know that the p-value > 0.05 (for my purposes) means that the data
> > IS normally distributed but what I am not sure is with the W value,
> > what values tell me that the data is normally distributed.   I know
> > that my data is normally distributed, but what I want to know if how
> > to interpret the W value, I have read that "if W is very small then
> > the distribution is probably not normally distributed", but how
> > "small"  is "very small", and also, what happens is, say W = 0.000001
> > but the p-value is > my significance level (0.05)? is the hypothesis
> > rejected?
> >
>
> Peter Ehlers
>
> > thank you!
> >
> > Omar
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> set.seed(34); shapiro.test(rexp(10))
>
>
>

-- 
Omar Baqueiro Espinosa
Computer Science PhD Candidate
Computer Systems Engineer
Workpage: www.csc.liv.ac.uk/~omar/
HomePage (spanish):http://www.baqueiro.co.uk/
PGP Key available at: www.csc.liv.ac.uk/~omar/pgp.html
_____