[R] normality tests [Broadcast]
Martin Maechler
maechler at stat.math.ethz.ch
Mon May 28 12:01:21 CEST 2007
>>>>> "LuckeJF" == Lucke, Joseph F <Joseph.F.Lucke at uth.tmc.edu>
>>>>> on Fri, 25 May 2007 12:29:49 -0500 writes:
LuckeJF> Most standard tests, such as t-tests and ANOVA,
LuckeJF> are fairly resistant to non-normalilty for
LuckeJF> significance testing. It's the sample means that
LuckeJF> have to be normal, not the data. The CLT kicks in
LuckeJF> fairly quickly.
Even though such statements appear in too many (text)books,
that's just plain wrong practically:
Even though *level* of the t-test is resistant to non-normality,
the power is not at all!! And that makes the t-test NON-robust!
It's an easy exercise to see that lim T-statistic ---> 1 when
one observation goes to infinity, i.e., the t-test will never
reject when you have one extreme outlier; simple "proof" with R:
> t.test(11:20)
One Sample t-test
data: c(11:20)
t = 16.1892, df = 9, p-value = 5.805e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
13.33415 17.66585
sample estimates:
mean of x
15.5
## ---> unknown mean highly significantly different from 0
## But
> t.test(c(11:20, 1000))
One Sample t-test
data: c(11:20, 1000)
t = 1.1731, df = 10, p-value = 0.2679
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-94.42776 304.42776
sample estimates:
mean of x
105
LuckeJF> Testing for normality prior to choosing a test
LuckeJF> statistic is generally not a good idea.
Definitely. Or even: It's a very bad idea ...
Martin Maechler, ETH Zurich
LuckeJF> -----Original Message----- From:
LuckeJF> r-help-bounces at stat.math.ethz.ch
LuckeJF> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf
LuckeJF> Of Liaw, Andy Sent: Friday, May 25, 2007 12:04 PM
LuckeJF> To: gatemaze at gmail.com; Frank E Harrell Jr Cc:
LuckeJF> r-help Subject: Re: [R] normality tests [Broadcast]
LuckeJF> From: gatemaze at gmail.com
>> On 25/05/07, Frank E Harrell Jr
>> <f.harrell at vanderbilt.edu> wrote: > gatemaze at gmail.com
>> wrote: > > Hi all,
>> > >
>> > > apologies for seeking advice on a general stats
>> question. I ve run
>> > > normality tests using 8 different methods: > > -
>> Lilliefors > > - Shapiro-Wilk > > - Robust Jarque Bera >
>> > - Jarque Bera > > - Anderson-Darling > > - Pearson
>> chi-square > > - Cramer-von Mises > > - Shapiro-Francia
>> > >
>> > > All show that the null hypothesis that the data come
>> from a normal
>> > > distro cannot be rejected. Great. However, I don't
>> think it looks nice > > to report the values of 8
>> different tests on a report. One note is
>> > > that my sample size is really tiny (less than 20
>> independent cases). > > Without wanting to start a flame
>> war, are there any advices of which > > one/ones would be
>> more appropriate and should be reported (along with > > a
>> Q-Q plot). Thank you.
>> > >
>> > > Regards,
>> > >
>> >
>> > Wow - I have so many concerns with that approach that
>> it's hard to know > where to begin. But first of all,
>> why care about normality? Why not > use
>> distribution-free methods?
>> >
>> > You should examine the power of the tests for n=20.
>> You'll probably
>> > find it's not good enough to reach a reliable
>> conclusion.
>>
>> And wouldn't it be even worse if I used non-parametric
>> tests?
LuckeJF> I believe what Frank meant was that it's probably
LuckeJF> better to use a distribution-free procedure to do
LuckeJF> the real test of interest (if there is one) instead
LuckeJF> of testing for normality, and then use a test that
LuckeJF> assumes normality.
LuckeJF> I guess the question is, what exactly do you want
LuckeJF> to do with the outcome of the normality tests? If
LuckeJF> those are going to be used as basis for deciding
LuckeJF> which test(s) to do next, then I concur with
LuckeJF> Frank's reservation.
LuckeJF> Generally speaking, I do not find goodness-of-fit
LuckeJF> for distributions very useful, mostly for the
LuckeJF> reason that failure to reject the null is no
LuckeJF> evidence in favor of the null. It's difficult for
LuckeJF> me to imagine why "there's insufficient evidence to
LuckeJF> show that the data did not come from a normal
LuckeJF> distribution" would be interesting.
LuckeJF> Andy
>> > > Frank
>> >
>> >
>> > --
>> > Frank E Harrell Jr Professor and Chair School of
>> Medicine > Department of Biostatistics Vanderbilt
>> University
>> >
>>
>>
>> --
>> yianni
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> read the posting guide
>> http://www.R-project.org/posting-guide.html and provide
>> commented, minimal, self-contained, reproducible code.
>>
>>
>>
LuckeJF> ------------------------------------------------------------------------
LuckeJF> ------ Notice: This e-mail message, together with
LuckeJF> any attachments,...{{dropped}}
LuckeJF> ______________________________________________
LuckeJF> R-help at stat.math.ethz.ch mailing list
LuckeJF> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
LuckeJF> do read the posting guide
LuckeJF> http://www.R-project.org/posting-guide.html and
LuckeJF> provide commented, minimal, self-contained,
LuckeJF> reproducible code.
LuckeJF> ______________________________________________
LuckeJF> R-help at stat.math.ethz.ch mailing list
LuckeJF> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
LuckeJF> do read the posting guide
LuckeJF> http://www.R-project.org/posting-guide.html and
LuckeJF> provide commented, minimal, self-contained,
LuckeJF> reproducible code.
More information about the R-help
mailing list