[R] normality test

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Fri Apr 29 19:21:19 CEST 2005


On 29-Apr-05 roger bos wrote:
> I looked carefully at ?shapiro.test and I did not see it state
> anywhere what the null hypothesis is or what a low p-value means.  I
> understand that I can run the example "shapiro.test(rnorm(100, mean =
> 5, sd = 3))" and deduce from its p-value of 0.0988 that the
> null-hypothesis must be normality, but why can't the help page
> explicitly state what the null hypothesis is.

Hi Roger,

Well, the opening line is

  Description:
       Performs the Shapiro-Wilk test for normality.

which does pretty strongly suggest that the hypothesis being
tested by shapiro.test(X) is normality of the distribution of X.

It might be just a shade more unambiguous of it were worded

       Performs the Shapiro-Wilk test of normality

or

       Performs the Shapiro-Wilk test for non-normality.

since testing "for" something, like testing "for" contamination
tends to suggest testing for something exceptional, and testing
"for" contamination could equally be seen as a test "of" purity.
("Excuse me, sir. I just need to test your data for normality.
 And you're in trouble if they are.")

But all that is on the very margin of semantic finesse!

> I also understand that the help pages are not meant to "teach"
> statistics, but stating the null hypothesis doesn't seem very
> difficult given the already considerable amount of time that probably
> went into creating these otherwise very good help pages.  Many people
> who use this software took stats classes 10 or more years ago and this
> stuff is easily forgotten.  Students frequently have trouble keeping
> the null and alternative hypothesis straight.
> 
> Just my $0.02.

I think there's a general approach in the help pages that users
understand the basics of what the function is about, and it is
there to specify what is necessary in order to get it to work
correctly.

One can take your point about stating explicitly what the null
hypothesis of a test is, that it would be useful for people who
are not sure about that sort of thing, and would advance their
statistical understanding at the same time as their proficiency
in R.

However, while this might be feasible for simple matters like
the null hypothesis being tested by a simple function like
shapiro.test or t.test (which, by the way, does not even hint
at what the null hypothesis might be: you have to infer it
from the options available for the alternative hypothesis),
it could get out of hand for tests applicable to more complex
situations like ANOVA, mixed models, and so on. There is a
dangert, if the hypotheis were to be spelled out, that the
help page might become a small (or not so small) book on that
aspect of statistics.

A better place for such things is in documents like "Introductory
Statistics with R" and so on.

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 29-Apr-05                                       Time: 17:54:19
------------------------------ XFMail ------------------------------




More information about the R-help mailing list