[R] Testing for normality of residuals in a regression model

Sat Oct 16 13:18:48 CEST 2004

> Prof Brian Ripley wrote:
> 
> >>However, stats 901 or some such tells you that if the distributions 
> >>have even slightly longer tails than the normal you can get much 
> >>better estimates than OLS, and this happens even before a test of 
> >>normality rejects on a sample size of thousands.
> >>
> >>Robustness of efficiency is much more important than robustness of 
> >>distribution, and I believe robustness concepts should be 
> in stats 101.
> >>(I was teaching them yesterday in the third lecture of a 
> basic course, 
> >>albeit a graduate course.)
> >>    
> >>

Federico Gherardini answered:
> This is a very interesting discussion. So you are basically 
> saying that it's better to use robust regression methods, 
> without having to worry too much about the distribution of 
> residuals, instead of using standard methods and doing a lot 
> of tests to check for normality? Did I get your point?

My feeling is that symmetry is more important than, let's say kurtosis <> 0
in the error. Is this correct? Now the problem is: the lower number of
observations, the more severe an effect of non-normality (at least,
asymmetry?) could be on the regression AND at the same time, power of tests
to detect non normality drops. So, I can imagine easily situations where
non-normality is not detected, yet asymmetry is such that regression is
significantly biased... It is mainly a question of sample size from this
point of view... But not only:

Andy Liaw wrote:
> Also, I was told by someone very smart that fitting OLS to
> data with heteroscedastic errors can make the residuals look
> `more normal' than they really are...  Don't know how true
> that is, though. 

That very smart person is not me, but it happens that I experimented also a
little bit on this a while ago! Just experiment with artificial data, and
you will see what happens: residuals look often more normal that the error
distribution you introduced in your artificial data... Another consequence,
is a biased estimate of parameters. Indeed, both come together: parameters
are biased in a direction that lowers residuals sum of square, obviously,
but also in some circumstances, in a direction that make residuals looking
more normal... And that is not (how can it be?) taken into account in the
test of normality. That is, I believe, a second reason why non-normality of
error could not be detected, yet it has a major impact on the OLS
regression.

And I am pretty sure there are other reasons, like distribution of error
both in the dependent and in the independent variables, another violation of
the assumptions made for OLS...

Best regards,

Philippe

..............................................<Â°}))><........
 ) ) ) ) )
( ( ( ( (    Prof. Philippe Grosjean
 ) ) ) ) )
( ( ( ( (    Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons-Hainaut University, Pentagone
( ( ( ( (    Academie Universitaire Wallonie-Bruxelles
 ) ) ) ) )   6, av du Champ de Mars, 7000 Mons, Belgium  
( ( ( ( (       
 ) ) ) ) )   phone: + 32.65.37.34.97, fax: + 32.65.37.33.12
( ( ( ( (    email: Philippe.Grosjean at umh.ac.be
 ) ) ) ) )      
( ( ( ( (    web:   http://www.umh.ac.be/~econum
 ) ) ) ) )
..............................................................