[R] pros and cons of "robust regression"? (i.e. rlm vs lm)

Thu Apr 6 18:09:35 CEST 2006

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Berton Gunter
Sent: 06 April 2006 14:22
To: 'r user'; 'rhelp'
Subject: Re: [R] pros and cons of "robust regression"? (i.e. rlm vs lm)

There is a **Huge** literature on robust regression, including many books that you can search on at e.g. Amazon. I think it fair to say that we have known since at least the 1970's that practically any robust downweighting procedure (see, e.g "M-estimation") is preferable (more efficient, better continuity properties, better estimates) to trimming "outliers" defined by arbitrary threshholds. An excellent but now probably dated introductory discussion can be found in "UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS" edited by Hoaglin, Tukey, Mosteller, et. al.

----
In the mixture-of-distributions approach of ADMB's robust_regression(x,y,a) command, there is no need to abandon the likelihood function for a more general function. The outliers are assumed to come from another, contaminating distribution, with extra parameter a, and then a proper, more complete, likelihood function is used. Also it seems that the mixture-of-distributions approach is more interpretable, more related to physical mechanisms generating departures from the distributional assumptions. In a paper on nonlinear models for the growth of certain marine animals where I used ADMB robust regression, I argued that the outliers were produced by human errors in the reading of age in certain hard structures in the body of the animals. This was consistent with the structure of the likelihood which consisted of the mixture of a normal and another contaminating distribution with fatter tails, operating mostly at higher values of the predictor variable (age).
Ruben