[R] Testing for normality of residuals in a regression model

Fri Oct 15 20:14:00 CEST 2004

Hi John,

Your point is well taken.  I was only thinking about the shape of the
distribution, and neglected the cases of, say, symmetric long tailed
distributions.  However, I think I'd still argue that other tools are
probably more useful than normality tests (e.g., robust methods, as you
mentioned).

To take the point a bit further, let's say we test for normality and it's
rejected.  What do we do then?  Well, if the non-normality is caused by
outliers, we can try robust methods.  If not, what do we do?  We can try to
see if some sort of transformation would bring the residuals closer to
normally distributed, but if the interest is in inference on the
coefficients, those inferences on the `final' model are potentially invalid.
What's one to do then?

Also, I was told by someone very smart that fitting OLS to data with
heteroscedastic errors can make the residuals look `more normal' than they
really are...  Don't know how true that is, though.

Best,
Andy

> From: John Fox
> 
> Dear Andy,
> 
> At the risk of muddying the waters (and certainly without wanting to
> advocate the use of normality tests for residuals), I believe 
> that your
> point #4 is subject to misinterpretation: That is, while it 
> is true that t-
> and F-tests for regression coefficients in large sample retain their
> validity well when the errors are non-normal, the efficiency of the LS
> estimates can (depending upon the nature of the 
> non-normality) be seriously
> compromised, not only absolutely but in relation to 
> alternatives (e.g.,
> robust regression).
> 
> Regards,
>  John
> 
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox 
> -------------------------------- 
> 
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch 
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy
> > Sent: Friday, October 15, 2004 11:55 AM
> > To: 'Federico Gherardini'; Berton Gunter
> > Cc: R-help mailing list
> > Subject: RE: [R] Testing for normality of residuals in a 
> > regression model
> > 
> > Let's see if I can get my stat 101 straight:
> > 
> > We learned that linear regression has a set of assumptions:
> > 
> > 1. Linearity of the relationship between X and y.
> > 2. Independence of errors.
> > 3. Homoscedasticity (equal error variance).
> > 4. Normality of errors.
> > 
> > Now, we should ask:  Why are they needed?  Can we get away 
> > with less?  What if some of them are not met?
> > 
> > It should be clear why we need #1.
> > 
> > Without #2, I believe the least squares estimator is still 
> > unbias, but the usual estimate of SEs for the coefficients 
> > are wrong, so the t-tests are wrong.
> > 
> > Without #3, the coefficients are, again, still unbiased, but 
> > not as efficient as can be.  Interval estimates for the 
> > prediction will surely be wrong.
> > 
> > Without #4, well, it depends.  If the residual DF is 
> > sufficiently large, the t-tests are still valid because of 
> > CLT.  You do need normality if you have small residual DF.
> > 
> > The problem with normality tests, I believe, is that they 
> > usually have fairly low power at small sample sizes, so that 
> > doesn't quite help.  There's no free lunch:  A normality test 
> > with good power will usually have good power against a fairly 
> > narrow class of alternatives, and almost no power against 
> > others (directional test).  How do you decide what to use?
> > 
> > Has anyone seen a data set where the normality test on the 
> > residuals is crucial in coming up with appriate analysis?
> > 
> > Cheers,
> > Andy
> > 
> > > From: Federico Gherardini
> > > 
> > > Berton Gunter wrote:
> > > 
> > > >>>Exactly! My point is that normality tests are useless for
> > > this purpose for
> > > >>>reasons that are beyond what I can take up here. 
> > > >>>
> > > Thanks for your suggestions, I undesrtand that! Could you 
> possibly 
> > > give me some (not too complicated!) links so that I can 
> investigate 
> > > this matter further?
> > > 
> > > Cheers,
> > > 
> > > Federico
> > > 
> > > >>>Hints: Balanced designs are
> > > >>>robust to non-normality; independence (especially
> > > "clustering" of subjects
> > > >>>due to systematic effects), not normality is usually the
> > > biggest real
> > > >>>statistical problem; hypothesis tests will always reject
> > > when samples are
> > > >>>large -- so what!; "trust" refers to prediction validity
> > > which has to do
> > > >>>with study design and the validity/representativeness of
> > > the current data to
> > > >>>future. 
> > > >>>
> > > >>>I know that all the stats 101 tests say to test for
> > > normality, but they're
> > > >>>full of baloney!
> > > >>>
> > > >>>Of course, this is "free" advice -- so caveat emptor!
> > > >>>
> > > >>>Cheers,
> > > >>>Bert
> > > >>>
> > > >>>      
> > > >>>
> > > 
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide! 
> > > http://www.R-project.org/posting-guide.html
> > > 
> > >
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> 
> 
>