[R] Testing for normality of residuals in a regression model

Philippe Grosjean phgrosjean at sciviews.org
Sat Oct 16 13:18:48 CEST 2004

> Prof Brian Ripley wrote:
> >>However, stats 901 or some such tells you that if the distributions 
> >>have even slightly longer tails than the normal you can get much 
> >>better estimates than OLS, and this happens even before a test of 
> >>normality rejects on a sample size of thousands.
> >>
> >>Robustness of efficiency is much more important than robustness of 
> >>distribution, and I believe robustness concepts should be 
> in stats 101.
> >>(I was teaching them yesterday in the third lecture of a 
> basic course, 
> >>albeit a graduate course.)
> >>    
> >>

Federico Gherardini answered:
> This is a very interesting discussion. So you are basically 
> saying that it's better to use robust regression methods, 
> without having to worry too much about the distribution of 
> residuals, instead of using standard methods and doing a lot 
> of tests to check for normality? Did I get your point?

My feeling is that symmetry is more important than, let's say kurtosis <> 0
in the error. Is this correct? Now the problem is: the lower number of
observations, the more severe an effect of non-normality (at least,
asymmetry?) could be on the regression AND at the same time, power of tests
to detect non normality drops. So, I can imagine easily situations where
non-normality is not detected, yet asymmetry is such that regression is
significantly biased... It is mainly a question of sample size from this
point of view... But not only:

Andy Liaw wrote:
> Also, I was told by someone very smart that fitting OLS to
> data with heteroscedastic errors can make the residuals look
> `more normal' than they really are...  Don't know how true
> that is, though. 

That very smart person is not me, but it happens that I experimented also a
little bit on this a while ago! Just experiment with artificial data, and
you will see what happens: residuals look often more normal that the error
distribution you introduced in your artificial data... Another consequence,
is a biased estimate of parameters. Indeed, both come together: parameters
are biased in a direction that lowers residuals sum of square, obviously,
but also in some circumstances, in a direction that make residuals looking
more normal... And that is not (how can it be?) taken into account in the
test of normality. That is, I believe, a second reason why non-normality of
error could not be detected, yet it has a major impact on the OLS

And I am pretty sure there are other reasons, like distribution of error
both in the dependent and in the independent variables, another violation of
the assumptions made for OLS...

Best regards,


 ) ) ) ) )
( ( ( ( (    Prof. Philippe Grosjean
 ) ) ) ) )
( ( ( ( (    Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons-Hainaut University, Pentagone
( ( ( ( (    Academie Universitaire Wallonie-Bruxelles
 ) ) ) ) )   6, av du Champ de Mars, 7000 Mons, Belgium  
( ( ( ( (       
 ) ) ) ) )   phone: +, fax: +
( ( ( ( (    email: Philippe.Grosjean at umh.ac.be
 ) ) ) ) )      
( ( ( ( (    web:   http://www.umh.ac.be/~econum
 ) ) ) ) )

More information about the R-help mailing list