[R] normality tests [Broadcast]

Frank E Harrell Jr f.harrell at vanderbilt.edu
Fri May 25 23:42:26 CEST 2007


Lucke, Joseph F wrote:
>  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
> non-normalilty for significance testing. It's the sample means that have
> to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
> for normality prior to choosing a test statistic is generally not a good
> idea. 

I beg to differ Joseph.  I have had many datasets in which the CLT was 
of no use whatsoever, i.e., where bootstrap confidence limits were 
asymmetric because the data were so skewed, and where symmetric 
normality-based confidence intervals had bad coverage in both tails 
(though correct on the average).  I see this the opposite way: 
nonparametric tests works fine if normality holds.

Note that the CLT helps with type I error but not so much with type II 
error.

Frank

> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy
> Sent: Friday, May 25, 2007 12:04 PM
> To: gatemaze at gmail.com; Frank E Harrell Jr
> Cc: r-help
> Subject: Re: [R] normality tests [Broadcast]
> 
> From: gatemaze at gmail.com
>> On 25/05/07, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:
>>> gatemaze at gmail.com wrote:
>>>> Hi all,
>>>>
>>>> apologies for seeking advice on a general stats question. I ve run
> 
>>>> normality tests using 8 different methods:
>>>> - Lilliefors
>>>> - Shapiro-Wilk
>>>> - Robust Jarque Bera
>>>> - Jarque Bera
>>>> - Anderson-Darling
>>>> - Pearson chi-square
>>>> - Cramer-von Mises
>>>> - Shapiro-Francia
>>>>
>>>> All show that the null hypothesis that the data come from a normal
> 
>>>> distro cannot be rejected. Great. However, I don't think
>> it looks nice
>>>> to report the values of 8 different tests on a report. One note is
> 
>>>> that my sample size is really tiny (less than 20
>> independent cases).
>>>> Without wanting to start a flame war, are there any
>> advices of which
>>>> one/ones would be more appropriate and should be reported
>> (along with
>>>> a Q-Q plot). Thank you.
>>>>
>>>> Regards,
>>>>
>>> Wow - I have so many concerns with that approach that it's
>> hard to know
>>> where to begin.  But first of all, why care about
>> normality?  Why not
>>> use distribution-free methods?
>>>
>>> You should examine the power of the tests for n=20.  You'll probably
> 
>>> find it's not good enough to reach a reliable conclusion.
>> And wouldn't it be even worse if I used non-parametric tests?
> 
> I believe what Frank meant was that it's probably better to use a
> distribution-free procedure to do the real test of interest (if there is
> one) instead of testing for normality, and then use a test that assumes
> normality.
> 
> I guess the question is, what exactly do you want to do with the outcome
> of the normality tests?  If those are going to be used as basis for
> deciding which test(s) to do next, then I concur with Frank's
> reservation.
> 
> Generally speaking, I do not find goodness-of-fit for distributions very
> useful, mostly for the reason that failure to reject the null is no
> evidence in favor of the null.  It's difficult for me to imagine why
> "there's insufficient evidence to show that the data did not come from a
> normal distribution" would be interesting.
> 
> Andy
> 
>  
>>> Frank
>>>
>>>
>>> --
>>> Frank E Harrell Jr   Professor and Chair           School 
>> of Medicine
>>>                       Department of Biostatistics   
>> Vanderbilt University
>>
>> --
>> yianni
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
> 
> 
> ------------------------------------------------------------------------
> ------
> Notice:  This e-mail message, together with any
> attachments,...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list