# [R] Testing for normality of residuals in a regression model

Liaw, Andy andy_liaw at merck.com
Fri Oct 15 18:55:03 CEST 2004

```Let's see if I can get my stat 101 straight:

We learned that linear regression has a set of assumptions:

1. Linearity of the relationship between X and y.
2. Independence of errors.
3. Homoscedasticity (equal error variance).
4. Normality of errors.

Now, we should ask:  Why are they needed?  Can we get away with less?  What
if some of them are not met?

It should be clear why we need #1.

Without #2, I believe the least squares estimator is still unbias, but the
usual estimate of SEs for the coefficients are wrong, so the t-tests are
wrong.

Without #3, the coefficients are, again, still unbiased, but not as
efficient as can be.  Interval estimates for the prediction will surely be
wrong.

Without #4, well, it depends.  If the residual DF is sufficiently large, the
t-tests are still valid because of CLT.  You do need normality if you have
small residual DF.

The problem with normality tests, I believe, is that they usually have
fairly low power at small sample sizes, so that doesn't quite help.  There's
no free lunch:  A normality test with good power will usually have good
power against a fairly narrow class of alternatives, and almost no power
against others (directional test).  How do you decide what to use?

Has anyone seen a data set where the normality test on the residuals is
crucial in coming up with appriate analysis?

Cheers,
Andy

> From: Federico Gherardini
>
> Berton Gunter wrote:
>
> >>>Exactly! My point is that normality tests are useless for
> this purpose for
> >>>reasons that are beyond what I can take up here.
> >>>
> Thanks for your suggestions, I undesrtand that! Could you
> possibly give
> me some (not too complicated!)
> links so that I can investigate this matter further?
>
> Cheers,
>
> Federico
>
> >>>Hints: Balanced designs are
> >>>robust to non-normality; independence (especially
> "clustering" of subjects
> >>>due to systematic effects), not normality is usually the
> biggest real
> >>>statistical problem; hypothesis tests will always reject
> when samples are
> >>>large -- so what!; "trust" refers to prediction validity
> which has to do
> >>>with study design and the validity/representativeness of
> the current data to
> >>>future.
> >>>
> >>>I know that all the stats 101 tests say to test for
> normality, but they're
> >>>full of baloney!
> >>>
> >>>Of course, this is "free" advice -- so caveat emptor!
> >>>
> >>>Cheers,
> >>>Bert
> >>>
> >>>
> >>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help