[R] pros and cons of "robust regression"? (i.e. rlm vs lm)

Berton Gunter gunter.berton at gene.com
Thu Apr 6 19:00:36 CEST 2006


Thanks, Andy. Well said. Excellent points. The final weights from rlm serve
this diagnostic purpose, of course.

-- Bert
 

> -----Original Message-----
> From: Liaw, Andy [mailto:andy_liaw at merck.com] 
> Sent: Thursday, April 06, 2006 9:56 AM
> To: 'Berton Gunter'; 'r user'; 'rhelp'
> Subject: RE: [R] pros and cons of "robust regression"? (i.e. 
> rlm vs lm)
> 
> To add to Bert's comments:
> 
> -  "Normalizing" data (e.g., subtracting mean and dividing by 
> SD) can help
> numerical stability of the computation, but that's mostly 
> unnecessary with
> modern hardware.  As Bert said, that has nothing to do with 
> robustness.
> 
> -  Instead of _replacing_ lm() with rlm() or other robust 
> procedure, I'd do
> both of them.  Some scientists view robust procedures that 
> omit some data
> points (e.g., by assigning basically 0 weight to them) in 
> automatic fashion
> and just trust the result as bad science, and I think they 
> have a point.
> Use of robust procedure does not free one from examining the 
> data carefully
> and looking at diagnostics.  Careful treatment of outliers is 
> esspecially
> important, I think, for data coming from a confirmatory 
> experiment.  If the
> conclusion you draw depends on downweighting or omitting certain data
> points, you ought to have very good reason for doing so.  I 
> think it can not
> be over-emphasized how important it is not to take outlier 
> deletion lightly.
> I've seen many cases that what seems like outlier originally 
> turned out to
> be legitimate data, and omission of them just lead to overly 
> optimistic
> assessment of variability.
> 
> Andy
> 
> From: Berton Gunter
> > 
> > There is a **Huge** literature on robust regression, 
> > including many books that you can search on at e.g. Amazon. I 
> > think it fair to say that we have known since at least the 
> > 1970's that practically any robust downweighting procedure 
> > (see, e.g "M-estimation") is preferable (more efficient, 
> > better continuity properties, better estimates) to trimming 
> > "outliers" defined by arbitrary threshholds. An excellent but 
> > now probably dated introductory discussion can be found in 
> > "UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS" edited 
> > by Hoaglin, Tukey, Mosteller, et. al.
> > 
> > The rub in all this is that nice small sample inference 
> > results go our the window, though bootstrapping can help with 
> > this. Nevertheless, for a variety of reasons, my 
> > recommendation is simply to **never** use lm and **always** 
> > use rlm (with maybe a few minor caveats). Many would disagree 
> > with this, however.
> > 
> > I don't think "normalizing" data as it's conventionally used 
> > has anything to do with robust regression, btw.
> > 
> > -- Bert Gunter
> > Genentech Non-Clinical Statistics
> > South San Francisco, CA
> >  
> > "The business of the statistician is to catalyze the 
> > scientific learning process."  - George E. P. Box
> >  
> >  
> > 
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of r user
> > > Sent: Thursday, April 06, 2006 8:51 AM
> > > To: rhelp
> > > Subject: [R] pros and cons of "robust regression"? (i.e. 
> rlm vs lm)
> > > 
> > > Can anyone comment or point me to a discussion of the
> > > pros and cons of robust regressions, vs. a more
> > > "manual" approach to trimming outliers and/or
> > > "normalizing" data used in regression analysis?
> > > 
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list 
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> 
> 
> --------------------------------------------------------------
> ----------------
> Notice:  This e-mail message, together with any attachments, 
> contains information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station, New Jersey, USA 08889), and/or its 
> affiliates (which may be known outside the United States as 
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as 
> Banyu) that may be confidential, proprietary copyrighted 
> and/or legally privileged. It is intended solely for the use 
> of the individual or entity named on this message.  If you 
> are not the intended recipient, and have received this 
> message in error, please notify us immediately by reply 
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------
>




More information about the R-help mailing list