[R] pros and cons of "robust regression"? (i.e. rlm vs lm)
Berton Gunter
gunter.berton at gene.com
Thu Apr 6 19:00:36 CEST 2006
Thanks, Andy. Well said. Excellent points. The final weights from rlm serve
this diagnostic purpose, of course.
-- Bert
> -----Original Message-----
> From: Liaw, Andy [mailto:andy_liaw at merck.com]
> Sent: Thursday, April 06, 2006 9:56 AM
> To: 'Berton Gunter'; 'r user'; 'rhelp'
> Subject: RE: [R] pros and cons of "robust regression"? (i.e.
> rlm vs lm)
>
> To add to Bert's comments:
>
> - "Normalizing" data (e.g., subtracting mean and dividing by
> SD) can help
> numerical stability of the computation, but that's mostly
> unnecessary with
> modern hardware. As Bert said, that has nothing to do with
> robustness.
>
> - Instead of _replacing_ lm() with rlm() or other robust
> procedure, I'd do
> both of them. Some scientists view robust procedures that
> omit some data
> points (e.g., by assigning basically 0 weight to them) in
> automatic fashion
> and just trust the result as bad science, and I think they
> have a point.
> Use of robust procedure does not free one from examining the
> data carefully
> and looking at diagnostics. Careful treatment of outliers is
> esspecially
> important, I think, for data coming from a confirmatory
> experiment. If the
> conclusion you draw depends on downweighting or omitting certain data
> points, you ought to have very good reason for doing so. I
> think it can not
> be over-emphasized how important it is not to take outlier
> deletion lightly.
> I've seen many cases that what seems like outlier originally
> turned out to
> be legitimate data, and omission of them just lead to overly
> optimistic
> assessment of variability.
>
> Andy
>
> From: Berton Gunter
> >
> > There is a **Huge** literature on robust regression,
> > including many books that you can search on at e.g. Amazon. I
> > think it fair to say that we have known since at least the
> > 1970's that practically any robust downweighting procedure
> > (see, e.g "M-estimation") is preferable (more efficient,
> > better continuity properties, better estimates) to trimming
> > "outliers" defined by arbitrary threshholds. An excellent but
> > now probably dated introductory discussion can be found in
> > "UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS" edited
> > by Hoaglin, Tukey, Mosteller, et. al.
> >
> > The rub in all this is that nice small sample inference
> > results go our the window, though bootstrapping can help with
> > this. Nevertheless, for a variety of reasons, my
> > recommendation is simply to **never** use lm and **always**
> > use rlm (with maybe a few minor caveats). Many would disagree
> > with this, however.
> >
> > I don't think "normalizing" data as it's conventionally used
> > has anything to do with robust regression, btw.
> >
> > -- Bert Gunter
> > Genentech Non-Clinical Statistics
> > South San Francisco, CA
> >
> > "The business of the statistician is to catalyze the
> > scientific learning process." - George E. P. Box
> >
> >
> >
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of r user
> > > Sent: Thursday, April 06, 2006 8:51 AM
> > > To: rhelp
> > > Subject: [R] pros and cons of "robust regression"? (i.e.
> rlm vs lm)
> > >
> > > Can anyone comment or point me to a discussion of the
> > > pros and cons of robust regressions, vs. a more
> > > "manual" approach to trimming outliers and/or
> > > "normalizing" data used in regression analysis?
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
>
>
> --------------------------------------------------------------
> ----------------
> Notice: This e-mail message, together with any attachments,
> contains information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station, New Jersey, USA 08889), and/or its
> affiliates (which may be known outside the United States as
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
> Banyu) that may be confidential, proprietary copyrighted
> and/or legally privileged. It is intended solely for the use
> of the individual or entity named on this message. If you
> are not the intended recipient, and have received this
> message in error, please notify us immediately by reply
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------
>
More information about the R-help
mailing list