[RsR] Outlier identification [FWD]
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Sep 1 12:47:36 CEST 2008
This message was sent to me privately. I'm replying to the full
R-SIG-robust audience
------- start of forwarded message -------
From: Luis Orlindo Tedeschi <luis.tedeschi using gmail.com>
To: Martin Maechler <maechler using stat.math.ethz.ch>
Subject: Re: [RsR] CRAN task view "robust"
Date: Sun, 31 Aug 2008 08:48:27 -0500
Dear Mr. Maechler, I am very happy you provided information about robust
stats. The question I have for you is this. I visited your site but I
could not identify a procedure that checks on the raw data and tries to
identify outliers. For instance, one could use normal distribution
1.96SD and elimiate the values outside of that range or use quartiles.
Is this implemented in these packages and do you have any other method
to check raw data? Thanks a lot.
Luis O. Tedeschi, PhD, PAS
Assistant Professor
Texas A&M University
[........]
------- end of forwarded message -------
Short answer: I'd recommend to use
rlm( y ~ 1, method = "MM") # package MASS
or
lmrob(y ~ 1) # package 'robustbase'
and look at the ``robustness weights'' returned.
But really you should *NOT* detect and reject outliers
and then continue your analsys as if you hadn't done that.
*Rather* do a fully robust analysis (as rlm() e.g. would do).
Longer answer:
A typical procedure of
Using 1) outlier detection
2) drop outliers from the data;
with the remaining data :
3a) estimation
3b) inference [tests, confidence intervals, diagnostics]
is "BAD",
1) since the conclusions can be quite WRONG,
{all P-values / all inference of the combined procedure is wrong,
even when the underlying data was truly normally distributed}
2) since the procedure is quite unstable,
particularly for the important and interesting case of
"borderline outliers".
There's much more to say abou this.
One good and probably not often enough read and understood
reference is
@ARTICLE{HamF85,
author = "Hampel, F.",
title = "The breakdown points of the mean combined with some
rejection rules",
journal = "Technometrics",
year = 1985,
volume = 27,
pages = "95--107",
}
-------
Martin Maechler, ETH Zurich
More information about the R-SIG-Robust
mailing list