[R] About normality tests...

Ralf B ralf.bierig at gmail.com
Wed Jun 23 20:05:21 CEST 2010


Hi all,

I have two very large samples of data (10000+ data points) and would
like to perform normality tests on it. I know that p < .05 means that
a data set is considered as not normal with any of the two tests. I am
also aware that large samples tend to lead more likely to normal
results (Andy Field, 2005).

I have a few questions to ensure that I am using them right.

1) The Shapiro-Wilk test requires to provide mean and sd. Is is
correct to add here the mean and sd of the data itself (since I am
comparing to a normal distribution with the same parameters) ?

mySD <- sd(mydata$myfield)
myMean <- mean(mydata$myfield)
shapiro.test(rnorm(100, mean = myMean, sd = mySD))

2) If I just want to test each distribution individually, I assume
that I am doing a one-sample Kolmogorov-Smirnov test. Is that correct?

3) If I simply want to know if normality exists or not, what should I
put for the parameter 'alternative' ? Does it actually matter?

alternative = c("two.sided", "less", "greater")

Thank you,
Ralf



More information about the R-help mailing list