[R] outliers/interval data extraction

Jason Turner jasont at indigoindustrial.co.nz
Thu Feb 20 19:18:02 CET 2003


On Thu, Feb 20, 2003 at 06:54:21PM +0100, Christian Hennig wrote:
... 
> However, a simple straight forward method for outlier identification is  
> median +/- 5.2*mad as suggested by Hampel, Technometrics 27 (1985) 95-107.
...
> x <- data vector
> medx <- median(x)
> madx <- mad(x)
> outliers <- (x<medx-5.2*madx) | (x>medx+5.2*madx)
> selected <- x[!outliers]

I haven't read the paper cited above, but I suspect the authors were
talking about the true mad.  By default, R re-scales the mad to adjust
for the normal case (ie multiplies by about 1.48).  If that's correct
(and I'm quite happy to be wrong), this changes 5.2 to 3.5 in the
example above.

Cheers

Jason
-- 
Indigo Industrial Controls Ltd.
64-21-343-545
jasont at indigoindustrial.co.nz




More information about the R-help mailing list