[R] Testing for normality of residuals in a regression model

Liaw, Andy andy_liaw at merck.com
Fri Oct 15 18:55:03 CEST 2004


Let's see if I can get my stat 101 straight:

We learned that linear regression has a set of assumptions:

1. Linearity of the relationship between X and y.
2. Independence of errors.
3. Homoscedasticity (equal error variance).
4. Normality of errors.

Now, we should ask:  Why are they needed?  Can we get away with less?  What
if some of them are not met?

It should be clear why we need #1.

Without #2, I believe the least squares estimator is still unbias, but the
usual estimate of SEs for the coefficients are wrong, so the t-tests are
wrong.

Without #3, the coefficients are, again, still unbiased, but not as
efficient as can be.  Interval estimates for the prediction will surely be
wrong.

Without #4, well, it depends.  If the residual DF is sufficiently large, the
t-tests are still valid because of CLT.  You do need normality if you have
small residual DF.

The problem with normality tests, I believe, is that they usually have
fairly low power at small sample sizes, so that doesn't quite help.  There's
no free lunch:  A normality test with good power will usually have good
power against a fairly narrow class of alternatives, and almost no power
against others (directional test).  How do you decide what to use?

Has anyone seen a data set where the normality test on the residuals is
crucial in coming up with appriate analysis?

Cheers,
Andy

> From: Federico Gherardini
> 
> Berton Gunter wrote:
> 
> >>>Exactly! My point is that normality tests are useless for 
> this purpose for
> >>>reasons that are beyond what I can take up here. 
> >>>
> Thanks for your suggestions, I undesrtand that! Could you 
> possibly give 
> me some (not too complicated!)
> links so that I can investigate this matter further?
> 
> Cheers,
> 
> Federico
> 
> >>>Hints: Balanced designs are
> >>>robust to non-normality; independence (especially 
> "clustering" of subjects
> >>>due to systematic effects), not normality is usually the 
> biggest real
> >>>statistical problem; hypothesis tests will always reject 
> when samples are
> >>>large -- so what!; "trust" refers to prediction validity 
> which has to do
> >>>with study design and the validity/representativeness of 
> the current data to
> >>>future. 
> >>>
> >>>I know that all the stats 101 tests say to test for 
> normality, but they're
> >>>full of baloney!
> >>>
> >>>Of course, this is "free" advice -- so caveat emptor!
> >>>
> >>>Cheers,
> >>>Bert
> >>>
> >>>      
> >>>
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list