[R] Testing for normality of residuals in a regression model

Fri Oct 15 19:56:49 CEST 2004

Dear Andy,

At the risk of muddying the waters (and certainly without wanting to
advocate the use of normality tests for residuals), I believe that your
point #4 is subject to misinterpretation: That is, while it is true that t-
and F-tests for regression coefficients in large sample retain their
validity well when the errors are non-normal, the efficiency of the LS
estimates can (depending upon the nature of the non-normality) be seriously
compromised, not only absolutely but in relation to alternatives (e.g.,
robust regression).

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy
> Sent: Friday, October 15, 2004 11:55 AM
> To: 'Federico Gherardini'; Berton Gunter
> Cc: R-help mailing list
> Subject: RE: [R] Testing for normality of residuals in a 
> regression model
> 
> Let's see if I can get my stat 101 straight:
> 
> We learned that linear regression has a set of assumptions:
> 
> 1. Linearity of the relationship between X and y.
> 2. Independence of errors.
> 3. Homoscedasticity (equal error variance).
> 4. Normality of errors.
> 
> Now, we should ask:  Why are they needed?  Can we get away 
> with less?  What if some of them are not met?
> 
> It should be clear why we need #1.
> 
> Without #2, I believe the least squares estimator is still 
> unbias, but the usual estimate of SEs for the coefficients 
> are wrong, so the t-tests are wrong.
> 
> Without #3, the coefficients are, again, still unbiased, but 
> not as efficient as can be.  Interval estimates for the 
> prediction will surely be wrong.
> 
> Without #4, well, it depends.  If the residual DF is 
> sufficiently large, the t-tests are still valid because of 
> CLT.  You do need normality if you have small residual DF.
> 
> The problem with normality tests, I believe, is that they 
> usually have fairly low power at small sample sizes, so that 
> doesn't quite help.  There's no free lunch:  A normality test 
> with good power will usually have good power against a fairly 
> narrow class of alternatives, and almost no power against 
> others (directional test).  How do you decide what to use?
> 
> Has anyone seen a data set where the normality test on the 
> residuals is crucial in coming up with appriate analysis?
> 
> Cheers,
> Andy
> 
> > From: Federico Gherardini
> > 
> > Berton Gunter wrote:
> > 
> > >>>Exactly! My point is that normality tests are useless for
> > this purpose for
> > >>>reasons that are beyond what I can take up here. 
> > >>>
> > Thanks for your suggestions, I undesrtand that! Could you possibly 
> > give me some (not too complicated!) links so that I can investigate 
> > this matter further?
> > 
> > Cheers,
> > 
> > Federico
> > 
> > >>>Hints: Balanced designs are
> > >>>robust to non-normality; independence (especially
> > "clustering" of subjects
> > >>>due to systematic effects), not normality is usually the
> > biggest real
> > >>>statistical problem; hypothesis tests will always reject
> > when samples are
> > >>>large -- so what!; "trust" refers to prediction validity
> > which has to do
> > >>>with study design and the validity/representativeness of
> > the current data to
> > >>>future. 
> > >>>
> > >>>I know that all the stats 101 tests say to test for
> > normality, but they're
> > >>>full of baloney!
> > >>>
> > >>>Of course, this is "free" advice -- so caveat emptor!
> > >>>
> > >>>Cheers,
> > >>>Bert
> > >>>
> > >>>      
> > >>>
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html