[R] Shapiro-Wilk for levels of factor

Mon Feb 15 22:56:05 CET 2010

Achim showed you how, but you might want to consider why.

If you are trying to learn more about your data, then plots or other strategies may work better than the test.

If you are testing for normality in order to meet the assumptions of a test, then the test may not be accomplishing what you think.  The assumptions of normality are most important when sample sizes are low, but when sample sizes are low, most normality tests have low power to detect non-normality (I only know of one that has high power in this case, but there are other issues with that one), so a lack of significance does not mean that your routine is safe to use.  As sample sizes get larger, the normality tests become more powerful, but the need for normality goes away (CLT).   So testing normality to satisfy assumptions is usually meaningless for small sample sizes, and meaningless in a different way for large samples.  See fortune(234) and fortune(117).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ravi Kulkarni
> Sent: Sunday, February 14, 2010 9:49 AM
> To: r-help at r-project.org
> Subject: [R] Shapiro-Wilk for levels of factor
> 
> Hello,
>   I have data for an ANOVA where the between-subjects factor has three
> levels. How do I run a test of normality (using shapiro.test) on each
> of the levels of the factor for the dependent variable separately
> without creating extra datasets?
> 
>   Thanks,
> 
>     Ravi
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.